Merlin Instructions and Troubleshooting

Merlin Instructions:

  1. Install Merlin from github:
  2. Navigate to egs/build_your_own_voice/s1
  3. Run ./ voice_name
  4. Navigate to the experiments/voice_name directory. You should see directories labeled duration_model and acoustic_model.
  5. Inside the duration_model directory, create a directory named data.
  6. Still in the data directory, create a text file with the filenames of every utterance you want to train on, one per line with no file extensions. Name this file file_id_list_full.scp. (If you're training on a subset, you can probably get this file by copying over a file we've already made of filenames. If you're training on the whole corpus, you can use the command "ls [directory containing label files]| sed 's/.\{4\}$//' > file_id_list_full.scp".)
  7. Using the script merlin/misc/scripts/frontend/utils/, normalize the label files and create a directory of labels inside the data directory named label_phone_align. It takes as command line arguments the input directory of label files, the output directory, the label style (which will be phone_align), and the text file with the filenames.
  8. Now, navigate to the acoustic_model directory and create a directory named data in it. Inside the data directory, create the label_phone_align directory and the file_id_list_full.scp text file exactly as you did before.
  9. Open up the script merlin/misc/scripts/vocoder/world/ Edit merlin_dir to be your Merlin installation, generally "/homes/[your cs account]/merlin"). Edit wav_dir to be the directory containing the wav files of your training data. Edit out_dir to be the full path to the data directory in the acoustic_model directory. Then run the script.
  10. Back in the experiments/[voice_name] directory, make a third directory called test_synthesis. Within this directory, make a text file containing the file names of all your test files named test_id_list.scp.
  11. Then, make a directory within test_synthesis named prompt-lab containing normalized label files for your test utterances. Because Merlin's normalization script requires timestamps and our test label files don't have them, first use the Python script /proj/tts/examples/, which takes the input directory of the label files, the output directory for the label files, and the text file with list of filenames as command line arguments. Once you've output those label files, use the same normalization script you used to set up your training data.
  12. Return to the s1 directory and open up the file conf/global_settings.cfg and edit the Train, Valid, and Test values to be the sizes of your training, validation, and test sets. (You can check the entire size of your training corpus by using the wc command on the file_id_list_full.scp file you've created. I generally follow the demos and make the test and validation sets each 1/10 the size of the training set—that is, 5/6 of the training corpus is training, 1/12 is validation, and 1/12 is test.). You will also need to edit QuestionFile to point to the question file associated with your language. Question files in Merlin are located in /misc/questions
  13. Now you can run the script. I generally run this line by line, especially because the part that trains the acoustic model will synthesize the validation and test utterances. If those synthesized utterances sound really bad, it's not worth it to synthesize our actual test set (or even bother setting up the test_synthesis directory). If you are using the newer version of Merlin, instead you should run through each of the scripts labeled through
  14. Synthesized wav files can be found in experiments/[your_voice]/acoustic_model/gen for the validation and test portions of your training corpus and in experiments/[your_voice]/test_synthesis/wav.

Things to check if you get an error:

Last updated 10/4/2017 by ecooper
Speech lab students: to edit this page, go to /proj/speech/html/merlin.html