Automatic Summary Evaluation by the Pyramid Method

Speaker Name: Aaron Harnly
Speaker Info: Graduate Student, NLP Group; aaron@cs.columbia.edu
Date: Thursday February 17
Time: 10:30am-11:30pm
Location: CCLS Conference Room (Interchurch)

Abstract:
Summary evaluation is time-consuming, unreliable -- and essential. Recent methods of content selection evaluation have turned attention to key challenges of summary evaluations: that no single reference summary is ideal, i.e. that summaries with rather different content can be equally "good"; and that scores from a single summarizer can be quite unstable without a large number of test summaries.

Nenkova and Passonneau recently presented the Pyramid method, which attempts to address these challenges by explicitly requiring multiple reference evaluations, identifying sets of text fragments bearing the same meaning, using frequency as a marker of importance, and giving higher scores to summaries that incorporate more important information. This method shows promise for its stability and reliability; however, like many other manual scoring methods, it is very labor-intensive.

I will describe initial efforts to automate the Pyramid method. A working algorithm for scoring target summaries against extant annotated pyramids will be explained, and exploratory results presented. Proposed, tested, and possible refinements to the algorithm will be discussed (feedback welcome!). This work ties in to work by several NLP group members, so I look forward to a fruitful discussion.