Summary Generation through Intelligent Cutting and Pasting of the Input Document


Almost all automatic summarizers, from the first system built at IBM more than 40 years ago to dozens of commercial and research products nowadays, rely on extracting sentences from the original documents to produce summaries. Since the sentences are extracted without context, at best, they can be incoherent, and at worst, they can convey misleading information or be incomprehensible. It is clearly time to develop better summary generation techniques.

We present a framework for the fast construction of concise and coherent summaries of single documents in any domain. The proposed generation approach, called cut-and-paste , generates summaries through reusing the input document. Rather than using the extracted document sentences directly for producing summaries, the cut-and-paste approach edits the sentences in some way so that they are more concise, coherent, and appropriate for summaries.

We investigate two effective techniques, sentence reduction and sentence combination, for transforming extracted sentences into appropriate summary sentences. We have also constructed a reusable, large-scale lexicon to support the generation process. Properties of an appropriate summary are discovered via the automated analysis of human-produced summaries. Our cut-and-paste summary generation system can be used with any single-document summarizer, serving as its advanced generation component. We have developed a domain-independent, single-document summarizer based on lexical links and sentence positions and combined it with the cut-and-paste generator to yield a full-fledged summarization system CPS.