Model Interpolation Using HTS-Engine

If you have two trained models, you can use hts-engine to interpolate and synthesize from them. Here is an example interpolation command, assuming you already have voices trained under directories voice1 and voice2:

hts_engine -td voice1/voices/qst001/ver1/tree-dur.inf -td voice2/voices/qst001/ver1/tree-dur.inf -tf voice1/voices/qst001/ver1/tree-lf0.inf -tf voice2/voices/qst001/ver1/tree-lf0.inf -tm voice1/voices/qst001/ver1/tree-mgc.inf -tm voice2/voices/qst001/ver1/tree-mgc.inf -tl voice1/voices/qst001/ver1/tree-lpf.inf -tl voice2/voices/qst001/ver1/tree-lpf.inf -md voice1/voices/qst001/ver1/dur.pdf -md voice2/voices/qst001/ver1/dur.pdf -mf voice1/voices/qst001/ver1/lf0.pdf -mf voice2/voices/qst001/ver1/lf0.pdf -mm voice1/voices/qst001/ver1/mgc.pdf -mm voice2/voices/qst001/ver1/mgc.pdf -ml voice1/voices/qst001/ver1/lpf.pdf -ml voice2/voices/qst001/ver1/lpf.pdf -dm voice2/voices/qst001/ver1/mgc.win1 -dm voice2/voices/qst001/ver1/mgc.win2 -dm voice2/voices/qst001/ver1/mgc.win3 -df voice2/voices/qst001/ver1/lf0.win1 -df voice2/voices/qst001/ver1/lf0.win2 -df voice2/voices/qst001/ver1/lf0.win3 -dl voice2/voices/qst001/ver1/lpf.win1 -s 48000 -p 240 -a 0.55 -g 0 -l -b 0.4 -cm voice1/voices/qst001/ver1/gv-mgc.pdf -cm voice2/voices/qst001/ver1/gv-mgc.pdf -cf voice1/voices/qst001/ver1/gv-lf0.pdf -cf voice2/voices/qst001/ver1/gv-lf0.pdf -k voice2/voices/qst001/ver1/gv-switch.inf -em voice2/voices/qst001/ver1/tree-gv-mgc.inf -ef voice1/voices/qst001/ver1/tree-gv-lf0.inf -ef voice2/voices/qst001/ver1/tree-gv-lf0.inf -b 0.0 -ow nat_0001.wav nat_0001.lab -i 2 0.5 0.5

hts_engine

Speech lab students: we have this located in
/proj/tts/hts-2.3/hts_engine_API-1.09/bin/hts_engine

Options

td, tf, tm, and tl are all decision tree files (for state duration, spectrum, log f0, and low-pass filter respectively). You need to specify these for both voice1 and voice2.

md, mf, mm, and ml are all pdf model files for state duration, spectrum, lf0, and low-pass filter respectively. You need to specify these for both of the voices you are interpolating.

dm, df, and dl are window files for calculation delta of spectrum, lf0, and lpf respectively. These should be the same for both voices so no need to specify twice, but you do need to put win1, win2, and win3 for both dm and df, but just win1 for dl.

cm and cf are filenames for global variance of spectrum and lf0. It will let you specify these for either just one or both voices, but since they may differ for different voices, it should get specified for both voices.

k is a tree for GV switch. You can specify for just one or for both voices.

em and ef are decision tree files for global variance of spectrum and lf0. You can specify for just one or for both voices, but again these will differ for different voices, so best to specify for both.

Synthesis

-ow nat_0001.wav -- the output synthesized file.
nat_0001.lab -- the fullcontext label file to synthesize from.
-i 2 0.5 0.5 -- interpolate two voices, with weights each 0.5.