Please note that these instructions are meant for Columbia Speech Lab students only.
Run these scripts on kucing because all the dependencies are installed and working on there.
export ESTDIR=/proj/tts/tools/babel_scripts/build/speech_tools
export FESTVOXDIR=/proj/tts/tools/babel_scripts/build/festvox
export SPTKDIR=/proj/tts/tools/babel_scripts/build/SPTK
export BABELDIR=/proj/tts/data/babeldir
You will need to make sure that the language's Babel data is present in $BABELDIR. E.g., to add Amharic:
ln -s /proj/speech/corpora/babel/IARPA/IARPA-babel307b-v1.0b-build/BABEL_OP3_307 /proj/tts/data/babeldir/
You'll have to find the original directory under /proj/speech/corpora/babel by looking around, as each is named somewhat differently. The numerical language codes for each language can be found here.
Then, when you run each of the commands for voice training, replace BABEL_BP_105, which is the Turkish language directory name, with the directory for your new language, everywhere it appears, and also substituting the name of your voice directory for turkish.
These scripts are supposed to work on the Babel language packs as-is,
and for the most part they do, only we have run into issues for
languages that have both .wav and .sph-format audio data, since the
scripts expect .sph data only. (.sph files are telephone
conversations, .wav files are other types of recording conditions.) So before you start on a new language,
check whether there are .wav files mixed in with the audio data,
under
$BABELDIR/[yourlanguagecode]/conversational/training/audio
and if so, then create your own directories, one containing just the sph
files, and another containing just the corresponding .txt transcript
files for those .sph audio files, and use those directories instead
of $BABELDIR/[yourlanguagecode]/conversational/training/transcription
and $BABELDIR/[yourlanguagecode]/conversational/training/audio
respectively, in all of the commands that require them.
cd /proj/tts/tools/babel_scripts
mkdir turkish
cd turkish
/proj/tts/tools/babel_scripts/make_build setup_voice turkish \
$BABELDIR/BABEL_BP_105/conversational/reference_materials/lexicon.txt \
$BABELDIR/BABEL_BP_105/conversational/training/transcription \
$BABELDIR/BABEL_BP_105/conversational/training/audio
/proj/tts/tools/babel_scripts/make_build make_voice turkish \
$BABELDIR/BABEL_BP_105/conversational/reference_materials/lexicon.txt \
$BABELDIR/BABEL_BP_105/conversational/training/transcription \
$BABELDIR/BABEL_BP_105/conversational/training/audio
make_voice will take a long time so be sure to run it under screen.
The resulting voice may be used to synthesize new utterances as follows:
./bin/do_clustergen cg_test tts tts_test etc/txt.done.data.test
(You must provide your own txt.done.data.test. tts_test is the name of the directory under test/ where your output .wav files will go.)
Note that any words that are OOV (with respect to the lexicon used for training) in your test utterances just get skipped over when synthesizing, using these scripts as-is.
Then you can drop in your own data: put .wav files under wav/, and for transcripts, replace etc/txt.done.data.
Then you can run the make_voice command, again substituting in your own voice name. This used to also re-run setup_voice, which would clobber any new data you dropped in, but we have commented this out.
TODO: we haven't actually done this. We've only ever used the frontend and then dropped in the files to HTS to train a voice there. We should try this out and document it and any errors you may come across.