Congratulations to the Class of 2021

The department is extremely proud of all of our students and would like to honor this year’s graduates! We look forward to when we can all come together and celebrate in person.

First, a tribute from our faculty!

During Class Day, awards were given to students who excelled in academics, students with independent research projects, and to those with outstanding performance in teaching and services to the department. The list of awardees is in this year’s graduation handout.

Professor Augustin Chaintreau received the Janette and Armen Avanessians Diversity Award for outstanding research and leadership in advancing diversity in departmental, school, and university programs at Columbia.

At this year’s commencement, more than 600 students received a computer science degree. Click on the logos to see the CS graduates from each college.

Student spotlight

More memories from the past four years…

Columbia and Barnard students attend the 2019 Grace Hopper Conference in Orlando.
Lydia Chilton is the best professor at Columbia hands down. No one brings energy to the CS department the way that she does. GOAT. - Dillon Hayes
Ciaran Beckford and The Ngo at the PennApps 2017 Hackathon.
Charles Baird coding in the kitchen.
AI Generated Artwork, JR Carneiro CS SEAS '21 and Caroline Lin CS CC '21
Bryan Lei and John Wang at the AP Hackathon in Spring 2018
Audrey Cheng with fellow SWE interns at a Salesforce intern event in Denver, Colorado.
Cryptography with Tal! This class was so fascinating. - Ari Hirsch
Arkadiy Saakyan with friends at the tree lighting.
Anushri Arora took this photo of Olha Maslova while they were working on an Analysis of Algorithms problem set at the DSI lounge in late February 2020 - just before the "Stay at Home" story began.
Anastasia Dmitrienko and fellow TA Sophia Kolak giving a midterm recitation for CS Theory.
Amir Idris with the Columbia Data Science Society 2018
Alexander Peile and Tracy Chen declaring majors
Alexander Cohen on the right with other (non-Columbia) students in San Francisco during an internship.
Alex Horimbere on the day when he arrived at Columbia.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Yarden Carmeli at the Low (High) Beach
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Tanvi Hisaria, Manav, and Aeshna making the best of the Zoom year!
Shout out to DSA with Professor Bauer because he is almost always online, super approachable, and really makes an effort to help me understand CS concepts in general. - Andrew Molina
Sarah Radway and Sophia Kolak at Hack Harvard...trying to win "most useless hack" and still losing.
Sharon Jian with former and current TAs for UI Design grading together
Payal Chandak exploring the tunnels on campus
Rediet Bekele sneaking a Post Lab 7 nap in AP class (Sorry, Jae 🙁 )
Romy Zilkha studying at Butler Library
Rupal Gupta at Googleplex, where she interned her junior summer, and will be joining full-time.
Sara Bernstein and Rebecca Narin (BC '21), who became good friends after taking COMS 1004 together, as they finished their last class of Freshmen year!
Owen Bishop, Dominic Dyer, Jackson Storey, and Jon Lauer at the park.
Omer Fahri Onder kicking an opponent at the MIT taekwondo competition
Huge thanks to Professor Blaer for an engaging, fun, and interesting course in data structures. The knowledge I gained in this course has been the foundation for my coding abilities and a sturdy rock in my journey to becoming a data science graduate. - Aaron Morrill (with his sister, Haley)
Visiting Giphy as a part of Entrepreneurship Via Exploration with Anne Xie, Jenny Li, Monika Francsics, Sungbin Kim, Jeff Huang, Jordan Ramos, and Nico Molina.
Michael Karasev, Chris Mendell, Rodolfo Raimundo, Nathelie Hager, and Sabrina Selleck. After flying to El Paso during finals week and pulling 3 all-nighters, they were finally able to watch their scientific payload launch on board of Blue Origin's rocket.
Harrison Qu in matrix code.
Left to Right: Kelsey Namba, Serena Tam, David Ji at the Princeton Hackathon.
Lucas Hahn
Manasi Sharma at the GROWTH 2019 Conference at San Diego State University during a coding challenge.
Mariya Delyakova and COMS 1004 classmates a week before COVID hit.
Marwan Salam in Havemeyer 309 - Concluding AP Lecture
Jonathan Sanabria (center in blue) with other members of the Columbia Robotics Club working on a prototype for an underwater ROV.
$Discrete Math got me like...$
Discrete Math got me like...
Jinho Lee on a site tour of Google.
Jin Woo Won and Junyao Shi at the AP hackathon, hacking away as Jae played some music.
Jeffrey Kline - remembering long days in the library.
Malware Analysis is a great class - I love reverse engineering and learning how programs and operating systems work, so this class was particularly fun for me. - Christopher Smith (with his very sleepy pup!)
Samuel Fu's internship memories in 2019
JADE cohort January 2018 at Facebook
Jason Herrera the the Brooklyn Bridge in 2019.
Giovanni Sanchez hanging out with friends at Faculty House.
Gael Zendejas, Vikram Ho, Xiao Lim, and Adam Lin at the AP Hackathon.
Ryan James and fellow classmates in AP
From left: Kelsey Namba, Sarah Leventhal, Serena Tam. Taken in Lerner after a whole evening of working on a lab assignment in Advanced Programming.
Beom Joon Baek with James Valentini during Freshman year.
The Operating Systems group Maya, Ecenaz, and Ahmed grinding hard for OS projects.
Daniel Hanoch (in black jacket) and co-workers at the Earth Institute.
Danny Parrill at the virtual Columbia GS Honor Society induction.
Dorothee Grant and Serena Killion in class when Professor Sethumadhavan took attendance by having the students send a photo of them holding up their UNIs.
Daniel Halmos presenting the architecture for a group project in Topics for Robotic Learning.
Left to right: Cesar Ramos Medina, Iliana Cantu, Ecenaz Ozmen, Yefri Gaitan, Daniel Garces Botero at Google Games. Where they won the most energetic table. Go red team!
Professor Brian Smith is the best! - Rico Pesce

6 Papers From the Department Accepted to the EACL 2021

Six papers from CS researchers were accepted to the 16th conference of the European Chapter of the Association for Computational Linguistics (EACL). As the flagship European conference in the field of computational linguistics, EACL welcomes European and international researchers covering a broad spectrum of research areas that are concerned with computational approaches to natural language.

Below are brief descriptions and links to the papers.

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings
Kailash Karthik Saravanakumar Columbia University, Miguel Ballesteros Amazon AI, Muthu Kumar Chandrasekaran Amazon AI, Kathleen McKeown Columbia University & Amazon AI

This paper presents a new clustering paradigm for news streams, where clusters have a one-to-one correspondence with real-world events (for example, the Suez canal blockage). An important aspect of this problem is that the number of clusters is unknown and varies with time (new events occur and old events cease to be of relevance). The proposed paradigm follows a pipeline approach – where representations are built for each new article, comparisons are made with existing clusters to pick the most compatible one, and finally, a clustering decision is produced.

A surprising observation from this work is that contextual embeddings (from models like BERT), in contrast to their overwhelming success in many NLP problems, achieve sub-par performance by themselves on this clustering problem. However, when combined with other representations (like TF-IDF and timestamps) and fine-tuned with task-specific augmentations, they achieve new state-of-the-art performance. Another interesting observation is that the widely reported B-Cubed metrics are biased towards large clusters and hence don’t capture cluster fragmentation on smaller clusters as well. Since clusters corresponding to emerging events are small and errors made on such clusters are highly undesirable, the authors suggest using an additional metric CEAF-e to evaluate models for this task.

Segmenting Subtitles for Correcting ASR Segmentation Errors
David Wan Columbia University, Chris Kedzie Columbia University, Faisal Ladakh Columbia University, Elsbeth Turcan Columbia University, Petra Galuszkova University of Maryland, Elena Zotkina University of Maryland, Zhengping Jiang Columbia University, Peter Bell University of Edinburgh, and Kathleen McKeown Columbia University

For the task of spoken language translation, the usual approach is to have a pipeline consisting of Automatic Speech Recognition (ASR) that transforms audio files into words and utterances in the original language and a Machine Translation (MT) that translate the utterances into the target language. However this setup may suffer from input-output mismatches: ASR segments utterances by acoustic information such as pauses, and thus may produce run-on sentences or sentence fragments, but MT is usually trained on proper sentences without such issues and may not perform well under such setting. This paper proposes the use of an intermediate model to segment utterances into sentences to improve performance in MT as well as other downstream tasks.

One crucial problem for developing such models is the lack of suitable training data for segmentation, especially when the languages involved are low-resourced. To this end, this paper also proposes a way to use subtitles dataset as proxy speech data as well as creating synthetic acoustic utterances that mimic common ASR errors for the model to learn to fix. Using a simple neural tagging model, the authors of this paper show improvement over the baseline ASR segmentation on MT for Lithuanian, Bulgarian, and Farisi. A surprising finding is that the segmentation model most improves the translation quality of more syntactically complex segments.

“Talk to me with left, right, and angles”: Lexical entrainment in spoken Hebrew dialogue
Andreas Weise CUNY Graduate Center, Vered Silber-Varod The Open University of Israel, Anat Lerner The Open University of Israel, Julia Hirschberg Columbia University, and Rivka Levitan Columbia University

It has been well-documented for several languages that human interlocutors tend to adapt their linguistic productions to become more similar to each other. This behavior, known as entrainment, affects lexical choice as well, both with regard to specific words, such as referring expressions, and overall style.

Lexical entrainment is the behavior that causes the words that speakers use in a conversation to become more similar over time. Entrainment more broadly is a human behavior causing interlocutors to adapt to each other to become more similar. Its effects are measurable but entrainment itself is not a measure.

This paper offers the first investigation of such lexical entrainment in Hebrew.

The analysis of Hebrew speakers interacting in a Map Task, a popular experimental setup, provides rich evidence of lexical entrainment. No clear pattern of differences is found between speaker pairs by the combination of their genders, nor between speakers by their individual gender. However, speakers in a position of less power are found to entrain more than those with greater power, which matches theoretical accounts.

Overall, the results mostly accord with those for American English. There is, however, a surprising lack of entrainment on a list of hedge words that were previously found to be highly entrained in English. This might be due to cultural differences between American and Israeli speakers that render adoption of a more tentative style less appropriate in the Hebrew context.

Entity-level Factual Consistency of Abstractive Text Summarization
Feng Nan Amazon Web Services, Ramesh Nallapati Amazon Web Services, Zhiguo Wang Amazon Web Services, Cicero Nogueira dos Santos Amazon Web Services, Henghui Zhu Amazon Web Services, Dejiao Zhang Amazon Web Services, Kathleen McKeown Amazon Web Services & Columbia University, Bing Xiang Amazon Web Services

A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document. For example, state-of-the-art models trained on existing datasets exhibit entity hallucination, generating names of entities that are not present in the source document.

The paper proposes a set of new metrics to quantify the entity-level factual consistency of generated summaries and shows that the entity hallucination problem can be alleviated by simply filtering the training data. In addition, the paper introduces a summary-worthy entity classification task to the training process as well as a joint entity and summary generation approach, which yields further improvements in entity-level metrics.

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space
Debanjan Ghosh Educational Testing Service, Ritvik Shrivastava MindMeld, Cisco Systems & Columbia University, and Smaranda Muresan Columbia University

Detecting arguments in online interactions is useful to understand how conflicts arise and get resolved. Users often use figurative language, such as sarcasm, either as persuasive devices or to attack the opponent by an ad hominem argument. To further our understanding of the role of sarcasm in shaping the disagreement space, the paper presents a thorough experimental setup using a corpus annotated with both argumentative moves (agree/disagree) and sarcasm. The research exploits joint modeling in terms of (a) applying discrete features that are useful in detecting sarcasm to the task of argumentative relation classification (agree/disagree/none), and (b) multitask learning for argumentative relation classification and sarcasm detection using deep learning architectures (e.g., dual Long ShortTerm Memory (LSTM) with hierarchical attention and Transformer-based architectures). The paper shows that modeling sarcasm improves the argumentative relation classification task (agree/disagree/none) in all setups.

A Unified Feature Representation for Lexical Connotations
Emily Allaway Columbia University and Kathleen McKeown Columbia University

Ideological attitudes and stances are often expressed through subtle meanings of words and phrases. Understanding these connotations is critical to recognize the cultural and emotional perspectives of the speaker. In this paper, the researchers use distant labeling to create a new lexical resource representing connotation aspects for nouns and adjectives. Their analysis shows that it aligns well with human judgments. Additionally, they present a method for creating lexical representations that capture connotations within the embedding space and show that using the embeddings provides a statistically significant improvement on the task of stance detection when data is limited.

Shree K. Nayar has been awarded the prestigious Funai Achievement Award from the Information Processing Society of Japan for his seminal work on computer vision and computational imaging. Nayar will receive the award at the Forum on Information Technology (FIT) to be held in Sendai, Japan, in August 2021.

Month: April 2021