Merlin Instructions and Troubleshooting

Before You Get Started

Merlin Instructions:

  1. Check that the GPU on the machine you are using is currently free, by running python merlin/src/
  2. Navigate to egs/build_your_own_voice/s1
  3. Run ./ voice_name
  4. Navigate to the experiments/voice_name directory. You should see directories labeled duration_model and acoustic_model.
  5. Go to duration_model/data, create a text file with the filenames of every utterance you want to train on, one per line with no file extensions. Name this file file_id_list.scp. (If you're training on a subset, you can probably get this file by copying over a file we've already made of filenames. If you're training on the whole corpus, you can use the command "ls [directory containing label files]| sed 's/.\{4\}$//' > file_id_list.scp".)
  6. Using the script merlin/misc/scripts/frontend/utils/, normalize the label files and create a directory of labels inside the data directory named label_phone_align. It takes as command line arguments the input directory of label files, the output directory, the label style (which will be phone_align), and the text file with the filenames.
  7. Copy the label_phone_align directory and the file_id_list.scp text file to acoustic_model/data.
  8. Run ./ [path_to_wav_dir] [path_to_feat_dir] in merlin/egs/build_your_own_voice/s1
  9. Go to experiments/[voice_name]/test_synthesis, add your own test files and add the names to test_id_list.scp.
  10. Then, make a directory within test_synthesis named prompt-lab containing normalized label files for your test utterances. Because Merlin's normalization script requires timestamps and our test label files don't have them, first use the Python script /proj/tts/examples/, which takes the input directory of the label files, the output directory for the label files, and the text file with list of filenames as command line arguments. Once you've output those label files, use the same normalization script you used to set up your training data.
  11. Return to the s1 directory and open up the file conf/global_settings.cfg and edit the Train, Valid, and Test values to be the sizes of your training, validation, and test sets. (You can check the entire size of your training corpus by using the wc command on the file_id_list.scp file you've created. I generally follow the demos and make the test and validation sets each 1/10 the size of the training set—that is, 5/6 of the training corpus is training, 1/12 is validation, and 1/12 is test.). You will also need to edit QuestionFile to point to the question file associated with your language. Question files in Merlin are located in /misc/questions.
  12. Also in global_settings.cfg, change label style to your own setting.
  13. Run 04-07 in merlin/egs/build_your_own_voice/s1.
  14. Synthesized wav files can be found in experiments/[your_voice]/acoustic_model/gen for the validation and test portions of your training corpus and in experiments/[your_voice]/test_synthesis/wav.

Things to check if you get an error:

Miscellaneous Tips:

Adding In Phrasing:

If you have information about where the phrase breaks are in your file, you can do the following to train your voice to incorporate this phrasing. Last updated 9/19/2018 by ecooper
Speech lab students: to edit this page, go to /proj/speech/html/merlin.html