Speaker-Adaptive Training Using HTS

Using the HTS SAT demo as-is

If you are dropping in your own data, the main difference is that for each type of data, each speaker gets his or her own directory. When extracting acoustic features, keep in mind that you want to be setting appropriate f0 ranges for each speaker. Use a general male/female range, or even better, pick the range specifically for each speaker.

For synthesis, you have to have gen labels that match the speaker you want to adapt to; see how this is done in the data/Makefile.

Other changes you have to make in data/Makefile:

DATASET if you are using
Change the list of speakers to the appropriate speaker IDs. TRAINSPKR, ADAPTSPKR, and ALLSPKR.
Change ADAPTHEAD to whatever is appropriate.
Change F0_RANGES to the correct thing for each speaker. We are currently using 110 280 for female speakers and 50 280 for male, but it's better to customize for each speaker if possible.

Changes you have to make in scripts/Config.pm:

$spkr
$spkrPat -- the %%% is the mask for the part of the filename that represents the speaker ID.
$prjdir

About the training steps in scripts/Config.pm, from this thread on the HTS mailing list:

1~5 is adaptation based on SI model. 6~9 is speaker adaptive training for average voice model 10~13 is adaptation based on average voice model

Synthesizing directly from a SAT-trained AVM without adapting to a specific speaker

Note that this is theoretically not something you should do, since the AVM is in some undefined space until you adapt it to a particular speaker. However, the implementation of AVMs in HTS produce reasonable, average-sounding speech.

Speech lab students: see /proj/tts/examples/HTS-demo_AVM for an example. Modified CONVM and ENGIN steps were added to convert AVM MMFs to the HTS-engine format and synthesize from them. We basically added hts_engine synthesis to the "synthesize from SAT-trained AVM" step after the SPTK synthesis which is already done, removing everything referring to speaker transforms since we don't want to use one.