Hi there.
You've wandered onto a webpage for a PhD student at Columbia University. I'm part of the natural language processing group and am also involved with the machine learning group and the center for computational learning systems. My advisor is Kathy McKeown. Here's a brief research CV
- Email:
- Call: +1.212.939.7191
- Drop by: 724 CEPSR, Columbia University, NYC
Research
I spend most of my time being fascinated and frustrated by the fields of computational linguistics and machine learning. My research focus is in structured prediction for text-to-text generation tasks. For instance, I'm currently working towards building models for manipulating text under formal set-theoretic operations for combination and decomposition. Specifically, we attempt to generate fluent sentences with valid parses without explicitly relying on any one particular semantic representation of text. This work is relevant to areas like automated paraphrasing, summarization, question answering and translation.
Other projects that I'm currently involved in include the unsupervised retrieval of latent structure from text, topic models for web summarization and semi-automated annotation strategies. I've previously worked on problems like redundancy reduction in text, semi-parametric density estimation, selectional preference discovery and time-series clustering.
Refereed publications
-
In proceedings of the 24th International Conference on Computational Linguistics (COLING), Dec 2012, Mumbai, India.
-
In proceedings of Interspeech, Sep 2012, Portland, Oregon.
-
In proceedings of IJCNLP, Nov 2011, Chiang-Mai, Thailand.
-
In proceedings of the Workshop on Monolingual Text-to-Text Generation at ACL-HLT, June 2011, Portland, Oregon.
-
In proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL-HLT) Short Papers, June 2011, Portland, Oregon.
-
In proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL-HLT) Short Papers, June 2011, Portland, Oregon.
-
In proceedings of the 11th Annual North American Meeting of the Association of Computational Linguistics (NAACL-HLT) Short Papers, June 2010, Los Angeles, California.
-
In proceedings of the Workshop on Creating Speech and Text Language Data with Amazon's Mechanical Turk at NAACL-HLT, June 2010, Los Angeles, California.
-
In proceedings of the 7th International Conference on Language Resources and Computation (LREC), May 2010, Valletta, Malta.
-
In proceedings of the 22nd International Conference on Computational Linguistics (COLING), August 2008, Manchester, UK.
-
In proceedings of the 21st Conference on Neural Information Processing Systems (NIPS), December 2007, Vancouver, Canada.
-
In proceedings of the 18th European Conference on Machine Learning (ECML), September 2007, Warsaw, Poland.
Other publications
- Decreasing Textual RedundancyMaster's Thesis, December 2007, New York, New York, USA.
-
In proceedings of the 2nd New York Academy of Sciences Symposium on Machine Learning, October 2007, New York, New York, USA.
Datasets
- A corpus of ~300 pairs of related newswire sentences with multiple human-generated fusion annotations (5 intersections, 5 unions) of varying accuracy collected via Mechanical Turk users.Download (91 KB) Citation
- A collection of ~940 prepositional phrase attachment cases over unstructured blog text. Candidates were chosen automatically and final judgments were made by humans responding to multiple-choice questions on Mechanical Turk.Download (130 KB) Citation
Miscellany
The papers that I covered for my candidacy exam on text-to-text generation are available here.
My Erdős number is at most 4: Me → Tony Jebara → Tommi Jaakkola → Noga Alon → Paul Erdős
My Bacon number is still woefully undefined.