Subset Selection Scripts and Info

We have done a number of preliminary data selection experiments which required finding and writing scripts to select subsets based on different features. These can be re-used on other data sets, or used as an example.

Using Praat for Basic Prosodic Features

We use a Praat script to extract standard acoustic features. You can read more about Praat here:

Use this script:

Run it as follows, using Praat:
praat /proj/speech/tools/speechlab/praat_scripts/extractStandardAcoustics2.praat /path/to/your/file.wav 0 0 75 500

The values 75 500 are a default pitch range.

Using Festival for Speaking Rate

Speaking rate is syllables per second, and the syllable information can be obtained from the Festival.utt files. Run this to get syllables and seconds, excluding silences on the ends of the file:

$ESTDIR/../festival/bin/festival -b /proj/tts/examples/syltimes2.scm '(define myutt (utt.load nil "/path/to/your/file.utt"))' '(define allsyls (utt.relation.items myutt "Syllable"))' '(display (syltimes allsyls))'

This will output a pair in parentheses of (numsyllables numseconds).

Alternately, if you need the exact lengths of each syllable in the utterance, you can use syltimes3.scm instead of syltimes2.scm which will output the start and end times of each syllable.


Look in /proj/tts/data/english/macrophone/scripts/ for a number of scripts that select subsets based on different features.


Files of interest are under /proj/tts/data/english/callhome.

Under datasel/subsets/, longest_* are different-sized subsets of the longest utterances. e.g. longest_15_min.txt is 15 minutes worth of the longest utterances in the corpus (female speakers only). Other similarly-named files are subsets based on other features.

Under datasel/scripts are some of the scripts used to create those subsets.


/proj/tts/data/english/brn/datasel contains prepared subsets for a variety of features.

/proj/tts/data/english/brn/datasel/docs contains a few python scripts related to making subsets.