Daniel Bauer, who has been teaching Data Structures for the past four semesters, is joining the Computer Science Department where he will teach two courses a semester. Bauer is the third teaching-oriented faculty hired in the past two years as the department moves to meet the surging demand for computer science classes from majors and nonmajors alike.
“I feel especially lucky to be teaching at Columbia with its diverse student body where students are super curious and super excited to learn. I hope to help them discover new areas,” says Bauer.
In addition to the honors section of Data Structures and Algorithms (COMS W3137), Bauer is teaching Introduction to Computing for Engineers and Applied Scientists (ENGI E1006) this spring; next fall he is planning to teach Natural Language Processing (NLP), his particular research focus.
A sub-area of artificial intelligence, NLP is concerned with using computers to analyze written or spoken language. NLP researchers design systems that understand and generate natural language, extract information from text or speech, and automatically translate between languages. Demand for NLP is growing as user interfaces become more language-driven and as more and more free-form written and spoken data becomes available digitally.
The inherent ambiguity and nuance of human language pose challenges, however. Says Bauer, “The old Marx Brothers line ‘I shot an elephant in my pajamas’ allows for two interpretations, but human speakers know empirically that pajamas are worn by people, not elephants. This observation or knowledge has to be conveyed somehow to language processing systems.”
One way is to collect enough sample sentences and train machine learning models that predict that pajamas is more likely to be associated with people than with elephants. This machine learning approach works extremely well to disambiguate between multiple possible syntactic structures for a sentence. Bauer, whose research focus is in computational semantics, is looking beyond syntax and tries to train models that can infer the meaning of a sentence. “Large data sets of semantic annotations for sentences have only become available in the last few years and how to best train NLP tools on this data is an open problem. While syntactic representations traditionally use trees, semantic data sets that annotate predicate-argument structure use more general graphs, so the formal machinery we have been using for trees is no longer sufficient,” says Bauer, who developed grammar formalisms and models that can translate between text and graphs in his dissertation.
“Being able to recover the meaning of a sentence will help us build better systems for machine translation, search, text summarization, and information extraction, for example to detect emergent events, identify terror threats, or predict stock price developments.”
Bauer is also interested in incorporating outside and domain knowledge to understand a sentence in the context in which it was uttered. “In a dialog, we use language to refer to objects in our environment and events going on around us. Language processing systems, especially conversational agents, should be able to communicate using language that is grounded in this environment or in other modalities,” says Bauer.
One area of interest is trying to understand the meaning of natural language descriptions as they relate to spatial relationships of objects. As a cofounder of WordsEye, which automatically translates text into 3D scenes, Bauer is interested in extending semantic analyses to the world of objects and their properties. While WordsEye starts with a set of rules defining placement of objects according to user input text (what does “on” mean in a 3D scene?), Bauer wants to combine these rules with more machine–learning focused language processing. “Can we take the corpus of existing WordsEye scenes and automatically extract the text-to-scene generation system from that? Rather than manually engineering these rules, can we learn from how people use and place things in context?” It’s a future research subject he hopes to explore.
In his research, Bauer has always involved undergraduate and master’s students and will continue to do so as he assumes more teaching responsibility, which brings its own challenges.
“When I started out teaching programming classes, I enjoyed seeing the different expectations students bring to the classroom, but it also forces you to think hard about structuring a class that provides something to all these different students with many different backgrounds and different interests.”
This thoughtful approach to both research and educating others is an obvious asset to the department. Says Julia Hirschberg, chair of the Computer Science Department, “We are delighted that Daniel will be joining the CS faculty. Daniel has been an excellent teacher for us already as a graduate student preceptor, and we are extremely lucky to have been able to persuade him to stay on. The combination he brings of teaching expertise in the introductory classes and of deep knowledge of NLP will allow us to increase our teaching capacity in both areas.”
Bauer received an MSc in Language Science and Technology from Saarland University and a BSc in Cognitive Science from the University of Osnabrück, Germany.
This spring he will receive his PhD in Computer Science from Columbia University.
Posted 2/3/2017 – Linda Crane Photo credit: Tim Lee