intro

Introduction

I am a postdoc research scientist at the Computer Science Department at Columbia University, where I work with the Natural Language Processing Group and Prof. Kathy McKeown, who was my PhD adviser. Primarily, I work in two areas, question-answering and the automatic discovery of semantic information.

In my semantic project I am constructing a large-scale system to make automatic methods to obtain information about the meaning of words and phrases more precise and more useful. Without such information, efforts in most areas of language processing will remain rudimentary. One such area is, of course, question answering. When one can find an answer that echoes the question, the task is trivial, but even with the billions or trillions of web pages indexed and libraries digitalized, many questions are not so neat and orderly.

Currently, my question-answering work is for the DARPA GALE project, a multilingual, multimodal effort in its third year. I have also done work on questions about opinions and complex topical questions with answers that are short summaries.

My thesis, which I successfully defended on May 11, 2005, concerns how to automatically discover salient differences between documents, in particular how to determine what content is novel as new documents become available. This work was part of the NLP group's work in multidocument summarization.

In summarization, I've worked on a sentence-extraction-based multidocument summarizer that prepares most of the summaries in the experimental on-line news browsing system Columbia Newsblaster. I've also worked on BioGen, a system to extract short biographical sketches about people from a large corpus, done at Mitre.

In other work, I've looked at problems in information-extraction. I'm interested in the amount of discourse processing needed in these kinds of tasks, and in the kinds of machine learning appropriate for complex natural language problems.

You can see a pdf version of my resume here, or if you prefer a CV here .