The Pipe Line

The CPS system consists of 2 main modules. First, it assumes that there is an extraction module at the front end that can extract importance sentences from an input document. Second, the generation module will edit the extracted sentences to produce a summary, including removing redundant information or combining sentences, or even sending feedback to the extraction module if it considers certain extracted sentences are not necessary. The generation module is designed as a general generation tool, portable to any single-document summarizer in need of an advanced generation component.

The Generation Module

The generation module consists of two subsystem: sentence reduction module and sentence combination module. The sentence reduction system removes non-essential information from a sentence. The combination system merges sentences from a document, either in their original forms or their resulting forms from sentence reduction.

We collected a corpus consisting documents and their human-written summaries. We then developed an automatic program to analyze how humans construct summary sentences by cutting and pasting phrases from original documents. Other resources used in the generation module include a large-scale lexicon we combined from multiple resources, a syntactic parser for English, and a co-reference resolution system.