Columbia Games Corpus » Orthographic Transcription Guidelines

« Back to the main page


1. Getting started

  1. Go to the File Locking page.
  2. Check that no task for the wav file you want to transcribe has "in progress" status, and that no person has been assigned to it.
    If these conditions are not met, do not work on this file.
  3. Click "EDIT" in the "Orthographic Transcription" task of the wav file you want to transcribe.
  4. On the next screen, change the status of the wav file to "in progress", and enter your name.
  5. The status labels for all of the "Orthographic Transcription" tasks are links. Clicking on one of the links brings you to a web interface to transcribe that file.
  6. The files are broken up into segments (by splitting on silences). For each segment, you should --fingers crossed, etc.-- see an audio player in your browser and a text box to enter the transcription of that particular short segment.


2. Transcription guidelines

Punctuation/Symbols

Capital Letters

There are five instances in which capital letters are to be used:

  1. The personal pronoun “I”
  2. Anything that is spelled, such at “A T and T”, "S M I T H", "A O L"; make sure to leave a space between each capital letter to indicate the letter itself was said.
  3. Proper nouns and adjectives, for example:
  4. Letters in isolation you capitalize, J T Jones, A M (as in the time of day)
  5. False start if know word intended (Aug- August; note that each is aligned as for a word)

Examples of times when letter(s) should not be capitalized:

  1. The first word of the transcription should not be capitalized, even if it is the start of a sentence, unless it falls under one of the rules for capitalization listed above.
  2. The first word of any sentence should not be capitalized, unless it falls under one of the rules for capitalization listed above.
  3. noon, midnight

Titles

Spell out all words: mister, doctor, junior, miss, misses, miz (for Ms.), monsignor, father.

Shortened Words

When words are shortened by the speaker, there are times we add letters to make them correct words, for example:

The following words, however, should be transcribed as indicated:

Disfluencies

  1. Filled pauses are non-speech vocalizations. They are transcribed as uh, um, er, ah, mm, eh and sometimes oh, in the orthographic tier. They can also be combined to include uhhm, uhhuh, mmhm.

  2. False starts, whether they are repaired or not, are indicated as such with a hyphen. These occur when the speaker stops in the middle of a word and either substitutes another word or continues with the same word. If the incomplete word is not known, indicate so with '?-'. Examples:

Digit Rules

All numbers are to be typed out as words:

The number 0 said as “oh” is typed as 0o (the number zero followed by a lower case letter o).

Unintelligible Speech



« Back to the main page