I am a Ph.D. Candidate with the Natural Language Processing group in the Computer Science Department of Columbia University.

My research interests are very broad, including natural language processing, artificial intelligence, machine learning and computational genomics.

    However, I am primarily interested in Natural Language Generation. I am currently working with my advisor, Kathy McKeown, in the joint Columbia University, University of Colorado AQUAINT Open Q&A project.

   Previously, I also worked under her supervision in the MAGIC (Multimedia Abstract Generation for Intensive Care) project.

    I am interested in the task of content planning performed during deep generation (i.e., generation from semantic input, compared to generation from shallow sources, e.g., the one that may be performed for machine translation or sumarization).

    The task of content planning involves two subtasks: content selection and ordering. My goal is to be able to automatically learn schemas (a particular type of content planners) from corpora of text and data (text and knowledge resources). The ordering problem seems approachable using biological sequence analysis (ACL'01). Genetic algorithms proved useful for learning schema-learning (INLG'02). My most recent work is on using cross-entropy metrics on text clusters induced from data cluster for learning content selection rules (EMNLP'03).

I successfully defended my Thesis Proposal last May 1st, 2003.


My personal interests include graphic design, penpaling, video games, traveling and foreign languages. During high school, I took classes of Italian and French, and nowadays I am quite fluent in the latter. A few years ago I took a semester of Portuguese and I also feel confident about it. And I also studied (obviously) English. As I am from Argentina my mother tongue is Spanish.

I just finished taking my first semester of Elementary Chinese.


During my first year at Columbia, I was working in multi-team GENOME-related project with Vasileios Hatzivassiloglou.

    My contribution to our team fell mainly in Word Sense Disambiguation applied to highly technical level biological articles, using statistical methods (ISMB'01).




