Pairwise Naturalness HITs

We have a pairwise preference HIT on Mechanical Turk for comparing two voices. Workers can compare as many or as few utterances as they want. We get 5 ratings per pair, and typically post 12 sentences per voice. Voices are presented A/B or B/A randomly, to factor out order effects. Our HIT is a forced choice, that is, there is no "no preference" option. Once HITs are completed, we can evaluate for significance by doing a z-test and computing a two-tailed p-value. Instructions below are for Speech Lab students.

Posting HITs

This work must be done on cheshire, under /var/www.

cp -r amt_EMPTY amt##
cd amt##/scripts

Where ## is the next number that hasn't been used yet. Then change the following variables in setup.py:

Then alter make_linked_dirs.sh if necessary -- it assumes that your test utterances are in a directory structure like voicename/nat/hts_engine/, which may not always be the case.

Then run setup.py.

Then get the input CSV from docs/upload/[name].csv Then upload it to our "Naturalness HIT Pairwise Batch" task and post them.

Evaluating Results

Download the results CSV file and put it under amt##/docs/download and rename to batch_results.csv.
Then in scripts, run processResults.py.