CMU Sphinx G2P

We have found that Sequitur G2P fails on Amharic, likely due to Unicode bugs, and since it is no longer really maintained, we looked into alternate G2P tools and found the CMU Sphinx G2P tool.

We have a speech lab installation here:

/proj/tts/tools/g2p-seq2seq/installation/bin/g2p-seq2seq

And we have trained models (so far, only for Amharic) here:

/proj/tts/tools/g2p-seq2seq/models/

To train a model for a new language, please follow the instructions on their Github page.
Make sure your lexicon is in the correct format, e.g.:

abandonando a b a n d o n a1 n d o

One word per line, with the phonetic pronunciation following it, separated by spaces.

To run an existing model (e.g., for Amharic) to get pronunciations for a list of OOV words, do the following: