Pairwise Naturalness HITs
We have a pairwise preference HIT on Mechanical Turk for comparing two
voices.  Workers can compare as many or as few utterances as they
want.  We get 5 ratings per pair, and typically post 12 sentences per
voice.  Voices are presented A/B or B/A randomly, to factor out
order effects.  Our HIT is a forced choice, that is, there is no "no
preference" option.  Once HITs are completed, we can evaluate for
significance by doing a z-test and computing a two-tailed p-value.  Instructions below are for
Speech Lab students.
Posting HITs
This work must be done on cheshire, under /var/www.
cp -r amt_EMPTY amt##
cd amt##/scripts
Where ## is the next number that hasn't been used yet.  Then change the following variables in setup.py:
  - run_id
- run_name
- path
- theVoices
Then alter make_linked_dirs.sh if necessary -- it assumes
that your test utterances are in a directory structure
like voicename/nat/hts_engine/, which may not always
be the case.
Then run setup.py.
Then get the input CSV from docs/upload/[name].csv
Then upload it to our "Naturalness HIT Pairwise Batch" task and post them.
Evaluating Results
Download the results CSV file and put it
under amt##/docs/download and rename
to batch_results.csv.
  Then in scripts, run processResults.py.