Adapting Computer Science Education to a Changing Tech Landscape
How Columbia is redesigning CS programming courses for an AI-powered industry.
How Columbia is redesigning CS programming courses for an AI-powered industry.
Experts came together at Columbia to explore how brain science can shape the next generation of AI.
In the new course AI in Context, faculty from across the University teach AI through the lens of philosophy, music, literature, and other domains.
Sethumadhavan joined high-level officials and leading researchers at the Cyber Regulation and Harmonization Conference.
Eugene Wu and Brian Smith share their thoughts on how to ensure that emerging technologies benefit humanity.
Steven Feiner worked with Kaveri Thakoor to create a tool that combines the pattern-recognition power of AI with the domain expertise of human medical experts. The AI-powered decision support tool assists doctors in diagnosing eye disease.
CS researchers won a Best Paper Award at the European Conference on Computer Vision (ECCV) 2024, one of the premier international conferences in the fields of computer vision and machine learning. As a biennial event, ECCV attracts leading researchers, scholars, and practitioners from around the world, presenting cutting-edge advancements and breakthroughs. This year’s accepted papers from the department showcase groundbreaking innovations and high-impact research that push the boundaries of computer vision and artificial intelligence.
Minimalist Vision with Freeform Pixels
Jeremy Klotz Columbia University and Shree K. Nayar Columbia University
Abstract
A minimalist vision system uses the smallest number of pixels needed to solve a vision task. While traditional cameras use a large grid of square pixels, a minimalist camera uses freeform pixels that can take on arbitrary shapes to increase their information content. We show that the hardware of a minimalist camera can be modeled as the first layer of a neural network, where the subsequent layers are used for inference. Training the network for any given task yields the shapes of the camera’s freeform pixels, each of which is implemented using a photodetector and an optical mask. We have designed minimalist cameras for monitoring indoor spaces (with 8 pixels), measuring room lighting (with 8 pixels), and estimating traffic flow (with 8 pixels). The performance demonstrated by these systems is on par with a traditional camera with orders of magnitude more pixels. Minimalist vision has two major advantages. First, it naturally tends to preserve the privacy of individuals in the scene since the captured information is inadequate for extracting visual details. Second, since the number of measurements made by a minimalist camera is very small, we show that it can be fully self-powered, i.e., function without an external power supply or a battery.
How Video Meetings Change Your Expression
Sumit Sarin Columbia University, Utkarsh Mall Columbia University, Purva Tendulkar Columbia University, Carl Vondrick Columbia University
Abstract
Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set. Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task. Instead, we tackle the problem through the lens of generative domain translation: our method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains. We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). We also show the applicability of our method on discovering differences in presidential communication styles. Additionally, we are able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of our model. Finally, our method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show our approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors.
Controlling the World by Sleight of Hand
Sruthi Sudhakar Columbia University, Ruoshi Liu Columbia University, Basile Van Hoorick Columbia University, Carl Vondrick Columbia University, and Richard Zemel Columbia University
Abstract
Humans naturally build mental models of object interactions and dynamics, allowing them to imagine how their surroundings will change if they take a certain action. While generative models today have shown impressive results on generating/editing images unconditionally or conditioned on text, current methods do not provide the ability to perform object manipulation conditioned on actions, an important tool for world modeling and action planning. Therefore, we propose to learn an action-conditional generative models by learning from unlabeled videos of human hands interacting with objects. The vast quantity of such data on the internet allows for efficient scaling which can enable high-performing action-conditional models. Given an image, and the shape/location of a desired hand interaction, CosHand, synthesizes an image of a future after the interaction has occurred. Experiments show that the resulting model can predict the effects of hand-object interactions well, with strong generalization particularly to translation, stretching, and squeezing interactions of unseen objects in unseen environments. Further, CosHand can be sampled many times to predict multiple possible effects, modeling the uncertainty of forces in the interaction/environment. Finally, method generalizes to different embodiments, including non-human hands, i.e. robot hands, suggesting that generative video models can be powerful models for robotics.
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Basile Van Hoorick Columbia University, Rundi Wu Columbia University, Ege Ozguroglu Columbia University, Kyle Sargent Stanford University, Ruoshi Liu Columbia University, Pavel Tokmakov Toyota Research Institute, Achal Dave Toyota Research Institute, Changxi Zheng Columbia University, Carl Vondrick Columbia University
Abstract
Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose GCD, a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on synthetic multi-view video data only, zero-shot real-world generalization experiments show promising results in multiple domains, including robotics, object permanence, and driving environments. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier Columbia University, Utkarsh Mall Columbia University, Carl Vondrick Columbia University
Abstract
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses a large language model and its in-context learning abilities to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable fine-grained classifiers. We outperform the latest baselines by 18 .4% on five fine-grained iNaturalist datasets and by 22 .2% on two KikiBouba datasets, despite the baselines having access to privileged information about class names.
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Ali Zare Columbia University, Yulei Niu Columbia University, Hammad Ayyubi Columbia University, and Shih-Fu Chang Columbia University
Abstract
Procedure Planning in instructional videos entails generating a sequence of action steps based on visual observations of the initial and target states. Despite the rapid progress in this task, there remain several critical challenges to be solved: (1) Adaptive procedures: Prior works hold an unrealistic assumption that the number of action steps is known and fixed, leading to non-generalizable models in real-world scenarios where the sequence length varies. (2) Temporal relation: Understanding the step temporal relation knowledge is essential in producing reasonable and executable plans. (3) Annotation cost: Annotating instructional videos with step-level labels (i.e., timestamp) or sequencelevel labels (i.e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets. In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined. To address these challenges, we introduce Retrieval-Augmented Planner (RAP) model. Specifically, for adaptive procedures, RAP adaptively determines the conclusion of actions using an auto-regressive model architecture. For temporal relation, RAP establishes an external memory module to explicitly retrieve the most relevant state-action pairs from the training videos and revises the generated procedures. To tackle high annotation cost, RAP utilizes a weakly-supervised learning manner to expand the training dataset to other task-relevant, unannotated videos by generating pseudo labels for action steps. Experiments on CrossTask and COIN benchmarks show the superiority of RAP over traditional fixedlength models, establishing it as a strong baseline solution for adaptive procedure planning.
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang Massachusetts Institute of Technology, Hong-Xing Yu Stanford University, Rundi Wu Columbia University, Brandon Y. Feng Massachusetts Institute of Technology, Changxi Zheng Columbia University, Noah Snavely Cornell University, Jiajun Wu Stanford University, William T. Freeman Massachusetts Institute of Technology
Abstract
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https: //physdreamer.github.io/.
The app was first developed at a DevFest hackathon.
Columbia Engineering and the Knight First Amendment Institute recently convened multidisciplinary experts to discuss the impact of artificial intelligence on public discourse, free speech, and democracy.
MS Bridge student Ishan Dhanani was interviewed by Semafor on how his startup Agora Labs helps researchers book time to rent GPUs for their projects.
ARNI Director Zemel goes to Washington to explain to Congress how Columbia’s new AI institute will connect major progress made in AI systems to our understanding of the brain.
The projects will explore algorithmic fairness, unified methods for interpreting artistic images found on the internet, and the development of a differentially-private data market system.
David Knowles and collaborators develop RNA-based predictive models.
Manhattan Borough President Mark Levine assembled a group of New York legislators, academics, advocates, and tech policymakers — including a representative from Google — a week ago to discuss how people use artificial intelligence, and whether government regulation is keeping up with the explosive new technology.
The AI Institute for Artificial and Natural Intelligence (ARNI) will be led by CS professors Richard Zemel, Kathleen McKeown, and Christos Papadimitriou, as well as Liam Paninski of the Zuckerman Institute and Xaq Pitkow of Baylor College of Medicine and Rice University.
ChatGPT and other bots have revived conversations on artificial general intelligence. Scientists say algorithms won’t surpass you any time soon.
A PhD candidate who worked for OpenAI and Apple discusses natural language processing, AI hallucinations, and deep fakes.
Michelle Zhou (PhD ’99) explains what no-code AI means and presents five inflection points that led to her current work, including the impact of two professors in graduate school who helped her find her direction in AI.
Teaching AI to distinguish between causation and correlation would be a game changer— well, the game may be about to change.
The J.P. Morgan AI Research Awards 2021 partners with research thinkers across artificial intelligence.
Google Research held an online workshop on the conceptual understanding of deep learning. The workshop discussed how new findings in deep learning and neuroscience can help create better artificial intelligence systems. Christos Papadimitriou discussed how our growing understanding of information-processing mechanisms in the brain might help create algorithms that are more robust in understanding and engaging in conversations. Papadimitriou presented a simple and efficient model that explains how different areas of the brain inter-communicate to solve cognitive problems.
Shuran Song and Carl Vondrick are among the awardees chosen for their artificial intelligence (AI) research. The program aims to use AI for societal good.
Professor Kathy McKeown talks with DeepLearning.AI’s Andrew Ng about how she started in artificial intelligence (AI), her research projects, how her understanding of AI has changed through the decades, and AI career advice for learners of NLP.
Almost 400,000 babies were born prematurely—before 37 weeks gestation—in 2018 in the United States. One of the leading causes of newborn deaths and long-term disabilities, preterm birth (PTB) is considered a public health problem with deep emotional and challenging financial consequences to families and society. If doctors were able to use data and artificial intelligence (AI) to predict which pregnant women might be at risk, many of these premature births might be avoided.
The J.P. Morgan AI Research Awards 2019 partners with research thinkers across artificial intelligence. The program is structured as a gift that funds a year of study for a graduate student.
Prediction semantics and interpretations that are
grounded in real data
Principal Investigator: Daniel Hsu Computer Science Department & Data Science Institute
The importance of transparency in predictive technologies is by now well-understood by many machine learning practitioners and researchers, especially for applications in which predictions may have serious impacts on human lives (e.g., medicine, finance, criminal justice). One common approach to providing transparency is to ensure interpretability in the models and predictions produced by an application, or to accompany predictions with explanations. Interpretations and explanations may help individuals understand predictions that affect them, and also help developers reason about failure cases of their applications.
However, there are numerous possibilities for what constitutes a suitable interpretation or explanation, and the semantics of such provided by existing systems are not always clear.
Suppose, for example, that a bank uses a linear model to predict whether or not a loan applicant will forfeit on a loan. A natural strategy is to seek a sparse linear model, which are often touted as highly interpretable. However, attributing significance to variables with non-zero regression coefficients (e.g., zip-code) and not others (e.g., race, age) is suspect when variables may be correlated. Moreover, an explanation based on pointing to individual variables or other parameters of a model ignores the source of the model itself: the training data (e.g., a biased history of borrowers and forfeiture outcomes) and the model fitting procedure. Invalid or inappropriate explanations may create a “transparency fallacy” that creates more problems than are solved.
The researchers propose a general class of mechanisms that provide explanations based on training or validation examples, rather than any specific component or parameters of a predictive model. In this way, the explanation will satisfy two key features identified in successful human explanations: the explanation will be contrastive, allowing an end-user to compare the present data to the specific examples chosen from the training or validation data, and the explanation will be pertinent to the actual causal chain that results in the prediction in question. These features are missing in previous systems that seek to explain predictions based on machine learning methods.
“We expect this research to lead to new methods for interpretable machine learning,” said Daniel Hsu, the principal investigator of the project. Because the explanations will be based on actual training examples, the methods will be widely applicable, in essentially any domain where examples can be visualized or communicated to a human. He continued, “This stands in contrast to nearly all existing methods for explanatory machine learning, which either require strong assumptions like linearity or sparsity, or do not connect to the predictive model of interest or the actual causal chain leading to a given prediction of interest.”
Efficient Formal Safety Analysis of Neural Networks
Principal Investigators: Suman Jana Computer Science Department, Jeannette M. Wing Computer Science Department & Data Science Institute, Junfeng Yang Computer Science Department
Over the last few years, artificial intelligence (AI), in particular Deep Learning (DL) and Deep Neural Networks (DNNs), has made tremendous progress, achieving or surpassing human-level performance for a diverse set of tasks including image classification, speech recognition, and playing games such as Go. These advances have led to widespread adoption and deployment of DL in critical domains including finance, healthcare, autonomous driving, and security. In particular, the financial industry has embraced AI in applications ranging from portfolio management (“Robo-Advisor”), algorithmic trading, fraud detection, loan and insurance underwriting, sentiment and news analysis, customer service, to sales.
“Machine learning models are used in more and more safety and security-critical applications such as autonomous driving and medical diagnosis,” said Suman Jana, one of the principal investigators of the project. “Yet they are known to be fragile and frequently mispredicts on edge cases.“
In many critical domains including finance and autonomous driving, such incorrect behaviors can lead to disastrous consequences such as a gigantic loss in automated financial trading or a fatal collision of a self-driving car. For example, in 2016, a Google self-driving car crashed into a bus because it expected the bus to yield under a set of rare conditions but the bus did not. Also in 2016, a Tesla car in autopilot crashed into a trailer because the autopilot system failed to recognize the trailer as an obstacle due to its ‘white color against a brightly lit sky’ and the ‘high ride height.’
Before AI can become the next technological revolution, it must be robust against such corner-case inputs and does not cause disasters. The researchers believe AI robustness is one of the biggest challenges that needs to be solved in order to fully tame AI for good.
“Our research aims to create novel tools to verify that a machine learning model will not mispredict on certain important input ranges, ensuring safety and security,” said Junfeng Yang, one of the investigators of the research.
The proposed work enables rigorous analysis of autonomous AI systems and machine learning (ML) algorithms, enabling data scientists to (1) verify that their AI models function correctly within certain input regions and violate no critical properties they specify (e.g., bidding price is never higher than a given maximum) or (2) locate all sub-regions where their models misbehave and repair their model accordingly. This capability will also enable data scientists to explain and interpret the outputs from autonomous AI systems and ML algorithms by understanding how different input regions may lead to different output predictions. Said Yang,”If successful, our work will dramatically boost the robustness, explainability, and interpretability of today’s autonomous AI systems and ML algorithms, benefiting virtually every individual, business, and government that relies on AI and ML.”
They could have been at the beach enjoying the summer. Instead, high school students gathered from across New York City and New Jersey for the AI4All program hosted by the Columbia community. The students came to learn about artificial intelligence (AI) but this program had a special twist – computer science (CS) and social work concepts were combined for a deeper, more meaningful look at AI.
“We created a space for young people to think critically about the social implications of artificial intelligence for the communities that they live in,” said Desmond Patton, the program co-director and associate professor of the School of Social Work. “We wanted them to understand how things like race, power, privilege and oppression can be baked into algorithms and their adverse effects on communities.”
The program participants, composed of 9th, 10th and 11th graders, are from racial and ethnic groups underrepresented in AI: Black, Hispanic, and Asian. Girls as well as youth from lower-income backgrounds were particularly encouraged to apply. For three weeks the students attended lectures, went on field trips to visit local companies (LinkedIn and Samsung) involved in the program, and visited other AI4All programs, like at Princeton University. Their work culminated in a final project which they presented to their classmates, mentors, and industry professionals.
“I believe that it is important to bring more ethics to AI,” said Augustin Chaintreau, the program co-director and a CS assistant professor. He sees ethics integrated into technical concepts and taught at the same time. Instead of learning about the social consequences and fixing it after, to solve an issue. Shared Chaintreau, “It shouldn’t be thought about just in passing but as a central part of why this is a tool and its implications.”
An interdisciplinary approach to AI was even part of how the classes were structured. Technical CS concepts, such as machine learning and Python, were taught in the morning by professors and student volunteers. While in the afternoon, guest speakers came to talk about their perspective to the day’s lesson. So, on the same day, students learned about supervised and unsupervised learning, and in the afternoon, someone who was formerly incarcerated described how the criminal policing that survey people on social media had a role in making a case against them.
“We were learning college courses meant to be taught in a month but for us it was just a couple of weeks and that was really impressive,” said Genesis Lopez, who is part of the robotics team at her school. Lopez loves robotics but works more on the mechanical side. She goes back to the team knowing how to use Python and is confident she can step up and code if needed. Continued Lopez, “I learned a lot but my favorite part was the people, we became a family.”
Most people take for granted that when they speak, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological conditions, trying to communicate with others can be difficult and lead to frustration. While there have been a great number of recent advances in automatic speech recognition (ASR; a.k.a. speech-to-text) technologies, these interfaces can be inaccessible for those with speech impairments. Further, applications that rely on speech recognition as input for text-to-speech synthesis (TTS) can exhibit word substitution, deletion, and insertion errors. Critically, in today’s technological environment, limited access to speech interfaces, such as digital assistants that depend on directly understanding one’s speech, means being excluded from state-of-the-art tools and experiences, widening the gap between what those with and without speech impairments can access.
Kara Schechtman (CC ‘19) has been selected as one of the recipients of the Senator George J. Mitchell Scholarship Program to pursue a one-year postgraduate degree in Ireland. The fellowship is awarded to 12 individuals between the ages of 18 and 30 by the US-Ireland Alliance. Schechtman is headed to Trinity College Dublin where she will study philosophy.
“Artificial intelligence is advancing and we are at a point where ethics has to be considered,” said Schechtman, who is majoring in English and computer science at Columbia. “I believe studying philosophy will help me prepare for further studies in computer science.”
A point of frustration for her has been finding an area in computer science where both the humanities and computing overlap in a way that fits her interests. In artificial intelligence (AI), computing and philosophical questions can overlap, an intersection she finds fulfilling.
The development of AI poses a whole gamut of challenges to humanity, ranging from legislation challenges, to AI bias, to ethical concerns about potential machine consciousness, and even to potential existential threat. But these challenges have remained while the technical developments of AI has grown leaps and bounds.
One thing Schechtman hopes to answer through her studies is how society can act responsibly despite all that is unknown. “I think it also demands technical expertise to suggest actionable paths for responsibility now,” continued Schechtman. “Which is why it is so important for computer scientists and philosophers to work together, and for some people to study both.”
The Ireland location of the fellowship is also ideal. Dublin hosts the European Union headquarters of a number of tech giants. And having double majored in English, she looks forward to “nerding out” over Samuel Beckett and James Joyce in their home country.
“More broadly, the circumstances couldn’t be better — the other fellowship recipients seem amazing and I can’t wait to get to know them better, Trinity College Dublin is a wonderful school, and I am sure I will have a lot of fun exploring Ireland. I’m excited to grow from the experience in ways I don’t yet even expect,” said Schechtman.
Kai-Fu Lee (B.S. ’83) included in WIRED’s anniversary issue for his work that brings humanity to artificial intelligence.
Artificial intelligence (AI) has seeped into the daily lives of people in the developed world. From virtual assistants to recommendation engines, AI is in the news, our homes and offices. There is a lot of untapped potential in terms of AI usage, especially in humanitarian areas. The impact could have a multiplier effect in developing countries, where resources are limited. By leveraging the power of AI, businesses, nongovernmental organizations (NGOs) and governments can solve life-threatening problems and improve the livelihood of local communities in the developing world.
Find open faculty positions here.
President Bollinger announced that Columbia University along with many other academic institutions (sixteen, including all Ivy League universities) filed an amicus brief in the U.S. District Court for the Eastern District of New York challenging the Executive Order regarding immigrants from seven designated countries and refugees. Among other things, the brief asserts that “safety and security concerns can be addressed in a manner that is consistent with the values America has always stood for, including the free flow of ideas and people across borders and the welcoming of immigrants to our universities.”
This recent action provides a moment for us to collectively reflect on our community within Columbia Engineering and the importance of our commitment to maintaining an open and welcoming community for all students, faculty, researchers and administrative staff. As a School of Engineering and Applied Science, we are fortunate to attract students and faculty from diverse backgrounds, from across the country, and from around the world. It is a great benefit to be able to gather engineers and scientists of so many different perspectives and talents – all with a commitment to learning, a focus on pushing the frontiers of knowledge and discovery, and with a passion for translating our work to impact humanity.
I am proud of our community, and wish to take this opportunity to reinforce our collective commitment to maintaining an open and collegial environment. We are fortunate to have the privilege to learn from one another, and to study, work, and live together in such a dynamic and vibrant place as Columbia.
Sincerely,
Mary C. Boyce
Dean of Engineering
Morris A. and Alma Schapiro Professor