Julia Hirschberg elected to the National Academy of Engineering

Professor Julia Hirschberg has been elected to the National Academy of Engineering (NAE), one of the highest professional distinctions awarded to an engineer. Hirschberg was cited by the NAE for her “contributions to the use of prosody in text-to-speech and spoken dialogue systems, and to audio browsing and retrieval.” Her research in speech analysis uses machine learning to help experts identify deceptive speech, and even to assess sentiment and emotion across languages and cultures.

“I am thrilled to be elected to such an eminent group of researchers,” said Hirschberg, who is the Percy K. and Vida L.W. Hudson Professor of Computer Science and chair of the Computer Science Department, as well as a member of the Data Science Institute. “It is such a great honor.”

Hirschberg’s main area of research is computational linguistics, with a focus on prosody, or the relationship between intonation and discourse. Her current projects include research into emotional and deceptive speech, spoken dialogue systems, entrainment in dialogue, speech synthesis, text-to-speech synthesis in low-resource languages, and hedging behaviors.

“I was very pleased to learn of Julia’s election for her pioneering work at the intersection of linguistics and computer science,” Mary C. Boyce, Dean of Engineering and Morris A. and Alma Schapiro Professor, said. “She works in an area that is central to the way we communicate, understand, and analyze our world today and is uncovering new paths that make us safer and better connected. As chair of Computer Science, she has also led the department through a period of tremendous growth and exciting changes.”

Hirschberg, who joined Columbia Engineering in 2002 as a professor in the Department of Computer Science and has served as department chair since 2012, earned her PhD in computer and information science from the University of Pennsylvania. She worked at AT&T Bell Laboratories, where in the 1980s and 1990s she pioneered techniques in text analysis for prosody assignment in text-to-speech synthesis, developing corpus-based statistical models that incorporate syntactic and discourse information, models that are in general use today.

Hirschberg serves on numerous technical boards and editorial committees, including the IEEE Speech and Language Processing Technical Committee and the board of the Computing Research Association’s Committee on the Status of Women in Computing Research (CRA-W). Previously she served as editor-in-chief of Computational Linguistics and co-editor-in-chief of Speech Communication and was on the Executive Board of the Association for Computational Linguistics (ACL), the Executive Board of the North American ACL, the CRA Board of Directors, the AAAI Council, the Permanent Council of International Conference on Spoken Language Processing (ICSLP), and the board of the International Speech Communication Association (ISCA). She also is noted for her leadership in promoting diversity, both at AT&T Bell Laboratories and Columbia, and for broadening participation in computing.

Among her many honors, Hirschberg is a fellow of the IEEE (2017), the Association for Computing Machinery (2016), the Association for Computational Linguistics (2011), the International Speech Communication Association (2008), and the Association for the Advancement of Artificial Intelligence (1994); and she is a recipient of the IEEE James L. Flanagan Speech and Audio Processing Award (2011) and the ISCA Medal for Scientific Achievement (2011). In 2007, she received an Honorary Doctorate from the Royal Institute of Technology, Stockholm, and in 2014 was elected to the American Philosophical Society.

Hirschberg joins Dean Boyce and many other Columbia Engineering colleagues who are NAE members; most recently elected were Professors David Yao (Industrial Engineering and Operations Research) in 2015, Gordana Vunjak-Novakovic (Biomedical Engineering) in 2012, and Mihalis Yannakakis (Computer Science) in 2011.

On February 8, the NAE announced 84 new members and 22 foreign members, bringing its total U.S. membership to 2,281 and foreign members to 249. NAE membership honors those who have made outstanding contributions to engineering research, practice, or education, including significant contributions to the engineering literature, and to the pioneering of new and developing fields of technology, making major advancements in traditional fields of engineering, or developing/implementing innovative approaches to engineering education.

– Holly Evarts

Posted 2/7/2017

Steven Bellovin & coauthors call for new surveillance laws to reflect internet’s complexity

Paper published in Harvard Journal of Law & Technology shows how technology blurs line between metadata and private content, eroding protections codified in pre-internet communications.

Startup cofounded by Apoorv Agarwal (PhD’16) is featured in Forbes

Text IQ uses NLP and ML techniques to streamline document review process for attorneys. Already profitable, the company saved customers $3M in legal expenses just this year, not counting what was saved by averting problems.

Daniel Bauer joins department as lecturer, adding strength in NLP and AI

Daniel Bauer, who has been teaching Data Structures for the past four semesters, is joining the Computer Science Department where he will teach two courses a semester. Bauer is the third teaching-oriented faculty hired in the past two years as the department moves to meet the surging demand for computer science classes from majors and nonmajors alike.

“I feel especially lucky to be teaching at Columbia with its diverse student body where students are super curious and super excited to learn. I hope to help them discover new areas,” says Bauer.

In addition to the honors section of Data Structures and Algorithms (COMS W3137), Bauer is teaching Introduction to Computing for Engineers and Applied Scientists (ENGI E1006) this spring; next fall he is planning to teach Natural Language Processing (NLP), his particular research focus.

A sub-area of artificial intelligence, NLP is concerned with using computers to analyze written or spoken language. NLP researchers design systems that understand and generate natural language, extract information from text or speech, and automatically translate between languages. Demand for NLP is growing as user interfaces become more language-driven and as more and more free-form written and spoken data becomes available digitally.

The inherent ambiguity and nuance of human language pose challenges, however. Says Bauer, “The old Marx Brothers line ‘I shot an elephant in my pajamas’ allows for two interpretations, but human speakers know empirically that pajamas are worn by people, not elephants. This observation or knowledge has to be conveyed somehow to language processing systems.”

One way is to collect enough sample sentences and train machine learning models that predict that pajamas is more likely to be associated with people than with elephants. This machine learning approach works extremely well to disambiguate between multiple possible syntactic structures for a sentence. Bauer, whose research focus is in computational semantics, is looking beyond syntax and tries to train models that can infer the meaning of a sentence. “Large data sets of semantic annotations for sentences have only become available in the last few years and how to best train NLP tools on this data is an open problem. While syntactic representations traditionally use trees, semantic data sets that annotate predicate-argument structure use more general graphs, so the formal machinery we have been using for trees is no longer sufficient,” says Bauer, who developed grammar formalisms and models that can translate between text and graphs in his dissertation.

Where tree representations are traditionally used to represent syntactic structure of sentences, graph representations are a compact way of describing semantic information. In meaning representations, participants can play multiple roles (for example, “I” is the agent of shooting and the owner of the pajamas). Meaning representation graphs can contain components that are only implied by the original sentence (for example, wearing is never mentioned explicitly). Bauer develops formal grammar techniques and algorithms to translate between text strings and meaning representation graphs.

“Being able to recover the meaning of a sentence will help us build better systems for machine translation, search, text summarization, and information extraction, for example to detect emergent events, identify terror threats, or predict stock price developments.”

Bauer is also interested in incorporating outside and domain knowledge to understand a sentence in the context in which it was uttered. “In a dialog, we use language to refer to objects in our environment and events going on around us. Language processing systems, especially conversational agents, should be able to communicate using language that is grounded in this environment or in other modalities,” says Bauer.

One area of interest is trying to understand the meaning of natural language descriptions as they relate to spatial relationships of objects. As a cofounder of WordsEye, which automatically translates text into 3D scenes, Bauer is interested in extending semantic analyses to the world of objects and their properties. While WordsEye starts with a set of rules defining placement of objects according to user input text (what does “on” mean in a 3D scene?), Bauer wants to combine these rules with more machine–learning focused language processing. “Can we take the corpus of existing WordsEye scenes and automatically extract the text-to-scene generation system from that? Rather than manually engineering these rules, can we learn from how people use and place things in context?” It’s a future research subject he hopes to explore.

In his research, Bauer has always involved undergraduate and master’s students and will continue to do so as he assumes more teaching responsibility, which brings its own challenges.

“When I started out teaching programming classes, I enjoyed seeing the different expectations students bring to the classroom, but it also forces you to think hard about structuring a class that provides something to all these different students with many different backgrounds and different interests.”

This thoughtful approach to both research and educating others is an obvious asset to the department. Says Julia Hirschberg, chair of the Computer Science Department, “We are delighted that Daniel will be joining the CS faculty. Daniel has been an excellent teacher for us already as a graduate student preceptor, and we are extremely lucky to have been able to persuade him to stay on. The combination he brings of teaching expertise in the introductory classes and of deep knowledge of NLP will allow us to increase our teaching capacity in both areas.”

Bauer received an MSc in Language Science and Technology from Saarland University and a BSc in Cognitive Science from the University of Osnabrück, Germany.
This spring he will receive his PhD in Computer Science from Columbia University.

Posted 2/3/2017
– Linda Crane Photo credit: Tim Lee

Dragomir Radev joins the Yale Computer Science faculty

He received his PhD in 1999 under the supervision of Kathy McKeown. At Yale he will teach courses in Natural Language Processing and Artificial Intelligence and will lead the Language, Information, and Learning Lab.

Month: February 2017