About This Site

This is a set of how-to instructions and other resources for building voices with Festival, HTS, and Merlin. These instructions are fairly specific to the research we are doing, and as such do not cover everything that one might want to do with these tools. This is not a manual, but rather a guide and set of recipes for the various tasks involved in the experiments we are doing. This site is mainly intended for Speech Lab students, however anyone may feel free to use it. Please address any questions about using the various software tools to their respective help lists.


HTS: The Hidden Markov Model based speech synthesis toolkit. Main website and searchable email help list. There is no official manual for HTS but it is generally recommended to look at the HTK Book.

Festival: General speech synthesis framework. Main website and mailing lists.

Merlin: Neural Network based speech synthesis toolkit. Github page and issues page with Q&A.


Many of the corpora we are using are available through the LDC: CALLHOME, MACROPHONE, BABEL, and Turkish broadcast news.

The CMU ARCTIC databases are also available online.


Some publications related to this work:

Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis.
Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg. Interspeech, September 2016, San Francisco, California.
paper poster

Data Selection for Naturalness in HMM-based Speech Synthesis.
Erica Cooper, Yocheved Levitan, Julia Hirschberg. Speech Prosody, June 2016, Boston, Massachusetts.
paper poster

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data.
Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan and Julia Hirschberg. Interspeech, August 2017, Stockholm, Sweden.
paper poster