MCD

MCD (mel-cepstral distortion) is an objective measure for evaluating synthetic voices. It is a measure of the difference between two sequences of mel cepstra. Typically differences in timing are allowed for; this is enabled by either DTW (dynamic time warping to align the two sequences), or by synthesizing test utterances with the "gold" durations from the original speech (as opposed to synthesized durations from the model.) We use MCD scripts that also compute DTW.

We are using the MCD library for Python by Matt Shannon. Thanks to Kai-Zhan for setting it up and making a parallelized version, which can be found here:

/proj/tts/examples/kl2792/bin/get_mcd_dtw

Please only run this on kucing, where all the dependencies are installed.

Usage:

./get_mcd_dtw --param_order 60 NATDIR SYNTHDIR UTTID1 UTTID2 .... UTTIDN

NATDIR is the directory containing the .mgc files for the original speech that you are comparing to.

SYNTHDIR is the directory containing synthetic .mgc files. Hopefully you have saved these from when you did synthesis; if not, you can re-extract in the same way that you extract for natural audio in a new corpus.

UTTIDs are the utterances for which you want to compute mcd (minus file extension).

It returns a tuple of (mincostperframe, totframes). The first value is the MCD value that we care about. It returns one tuple for all of the utterances you give it.