Hi there.

You've wandered onto a webpage for a PhD student at Columbia University. I'm part of the natural language processing group and am also involved with the machine learning group and the center for computational learning systems. My advisor is Kathy McKeown. Here's a brief research CV 


Research

I spend most of my time being fascinated and frustrated by the fields of computational linguistics and machine learning. My research focus is in structured prediction for text-to-text generation tasks. For instance, I'm currently working towards building models for manipulating text under formal set-theoretic operations for combination and decomposition. Specifically, we attempt to generate fluent sentences with valid parses without explicitly relying on any one particular semantic representation of text. This work is relevant to areas like automated paraphrasing, summarization, question answering and translation.

Other projects that I'm currently involved in include the unsupervised retrieval of latent structure from text, topic models for web summarization and semi-automated annotation strategies. I've previously worked on problems like redundancy reduction in text, semi-parametric density estimation, selectional preference discovery and time-series clustering.

Refereed publications

Other publications

Datasets

Miscellany

The papers that I covered for my candidacy exam on text-to-text generation are available here.

My Erdős number is at most 4: Me → Tony Jebara → Tommi Jaakkola → Noga Alon → Paul Erdős
My Bacon number is still woefully undefined.