CMU Sphinx G2P

We have found that Sequitur G2P fails on Amharic, likely due to Unicode bugs, and since it is no longer really maintained, we looked into alternate G2P tools and found the CMU Sphinx G2P tool.

We have a speech lab installation here:

/proj/tts/tools/g2p-seq2seq/installation/bin/g2p-seq2seq

And we have trained models (so far, only for Amharic) here:

/proj/tts/tools/g2p-seq2seq/models/

To train a model for a new language, please follow the instructions on their Github page.
Make sure your lexicon is in the correct format, e.g.:

abandonando a b a n d o n a1 n d o

One word per line, with the phonetic pronunciation following it, separated by spaces.

To run an existing model (e.g., for Amharic) to get pronunciations for a list of OOV words, do the following:

Please work on kucing, as the dependencies are installed there.
Make sure you have the following line in your .bashrc:
export PYTHONPATH=$PYTHONPATH:/proj/tts/tools/g2p-seq2seq/installation/lib/python2.7/site-packages
Remember to run source ~/.bashrc if you've just added it.
If you want to run the Montreal Forced Aligner, you will have to remove this from your path, and if you want to run g2p-seq2seq again after that you will have to put it back.
Run this command:
/proj/tts/tools/g2p-seq2seq/installation/bin/g2p-seq2seq --decode your_OOV_wordlist --model /proj/tts/tools/g2p-seq2seq/models/babel_amharic/out
The output is your words followed by the predicted pronunciations.