Automatic Broadcast News Speech Summarization

Abstract

As the numbers of speech and video documents available on the web and on handheld devices soar to new levels, it becomes increasingly important to enable users to find relevant, significant and interesting parts of the documents automatically. In this dissertation, I present a system for summarizing Broadcast News (BN), ConciseSpeech, that identifies significant segments of speech using lexical, acoustic/prosodic, and structural information, and combines them, optimizing significance, length and redundancy of the summary. There are many obstacles particular to speech such as word errors, disfluencies and the lack of segmentation that make speech summarization challenging. I present methods to address these problems. I show the use of Automatic Speech Recognition (ASR) confidence scores to compensate for word errors; present a phrase-level machine translation approach using weighted finite state transducers for detecting disfluency; and present the possibility of using intonational phrase segments for summarization. I also describe structural properties of BN used in determining which segments should be selected for a summary, including speaker roles, soundbites and commercials. I present Information Extraction (IE) techniques based on statistical methods such as conditional random fields and decision trees to automatically identify such structural properties. ConciseSpeech was built for handling single spoken documents but we have extended it to handle user queries that can summarize multiple documents. For the query-focused version of ConciseSpeech we also built a knowledge resource (NE-NET) that can find related entities to improve the document retrieval task for query-focused summarization significantly. I show how all these techniques improve speech summarization when compared to traditional text-based methods applied to speech transcripts.

Thesis is available here :

Automatic Summaries for stories that were manually evaluated available here

Filename Format:

X.story.txt : Story File
X.summaryB.txt : Baseline Summary
X.summaryA.txt : Automatic Summary with ASR transcripts
X.summaryM.txt : Automatic Summary with Maual transcripts
X.summaryH.txt : Human Summary

dLabel the labeling tool for speech relations and entities is available here

alignement tool for aligning transcripts to move annotation is available here

Labeling Manual is available here