Events

Nov 18

AI Site Reliability Engineers: Automating Incident Response in Complex Systems

7:00 PM to 8:30 PM

Davis Auditorium

Anish Agarwal, Traversal

Abstract:
In this presentation, I will describe ongoing work at Traversal, where our team is developing an AI Site Reliability Engineer (SRE) designed to assist enterprises in diagnosing and mitigating production incidents, with the broader goal of improving the resilience of large-scale, mission-critical systems. I will outline why incident troubleshooting is rapidly becoming a central bottleneck in realizing end-to-end automation of the software development lifecycle (SDLC), particularly as organizations adopt increasingly complex cloud-native architectures and integrate AI-driven tooling across their operational workflows.

From a research perspective, automated incident response represents a technically rich and largely underexplored problem space. I will highlight how it brings together challenges at the intersection of agentic system design, large language model evaluation, causal inference on unstructured data, and time-series modeling of high-dimensional telemetry. These domains converge in the task of enabling AI systems to form hypotheses, reason under uncertainty, and propose actionable remediations in environments characterized by incomplete information, noisy signals, and strict reliability constraints. Our goal in this work is not only to build a practical system, but also to surface open problems and novel research opportunities for the broader community.

Bio:
Anish Agarwal is the CEO and Cofounder of Traversal, a startup building AI Site Reliability Engineer agents to help teams diagnose and remediate complex production incidents, and an Assistant Professor of Industrial Engineering and Operations Research at Columbia University. His research focuses on causal machine learning and data-driven decision-making in engineering and social systems.

This event is organized by Columbia's Data, Agents, and Processes Lab (DAPLab). For more information about the series, see https://daplab.cs.columbia.edu/entrepreneurship.

The Columbia Engineering AI Entrepreneurship Series is a bi-weekly speaker series that brings students and faculty at Columbia together with founders, VCs, technologists, and business leaders to learn about the process of transitioning lessons from research and the classroom into products and value.

Nov 19

Monthly Coffee and Questions

2:00 PM to 4:00 PM

CS Lounge

Drop in to ask questions and grab small bites
MS and UG students only

Nov 21

Transcend On-Campus Interview Day

10:00 AM to 4:00 PM

CS Lounge

Transcend is the privacy infrastructure that unleashes growth for the world’s greatest brands. We automate data and consumer preference governance at the systems layer to unlock AI, personalized experiences, and growth with speed and confidence. We integrate directly into data and vendor ecosystems—automating & enforcing privacy, security, and governance policies in real time, across every system and every user interaction.

Transcend is actively recruiting for a Software Engineer role. They will be conducting On-Campus Interviews.

The application link posted via VMock and email.

*Event Audience: CS Columbia Graduate Students (MS & PhD)

Nov 24

Quantum Networks: A Classical Perspective

11:40 AM to 1:00 PM

CSB 451 CS Auditorium

Don Towsley, University of Massachusetts Amherst

Abstract:
Quantum information processing is at the threshold of having significant impact on technology and society in the form of providing unbreakable security, ultra-high-precision distributed sensing, and polynomial/exponential speed-ups in computing. Many of these applications are enabled by high rate distributed shared entanglement between pairs and groups of users. A critical missing component that prevents crossing this threshold is a distributed infrastructure in the form of a world-wide “Quantum Internet”. This motivates the study of quantum networks, namely, to identify the right architecture and how should it operate, e.g., dynamic fair allocation of resources. Moreover, the architecture and network operation must account for operation in harsh, noisy environments.

This talk addresses the following question: what ideas can the design of a quantum network borrow from classical networks? At first glance the answer appears to be “very little”. The focus of this talk, however, is to argue that the opposite is true and that much can be borrowed from classical networks. We begin by reviewing two proposed quantum network architectures two-way and one-way architectures. A two-way network generates and distributes quantum entanglement to pairs or groups of users whereas a one-way network allows for direct transfer of quantum information from one user to another. We compare these architectures and conclude that a two-way architecture is superior. A two-way architecture appears very different from the classical Internet architecture. However, we will introduce a “connectionless” two-way quantum network architecture that allows one to easily adapt many ideas from classical networks (good and bad

Dec 08

Towards Accountable Conversational Agents for Task Completion

11:40 AM to 12:40 PM

CSB 451 CS Auditorium

Dilek Hakkani-Tür, University of Illinois Urbana-Champaign

Abstract:
Task-oriented dialogue systems are designed to assist users in achieving specific, well-defined goals or tasks through natural language interactions. These systems act as a conversational bridge, connecting users to task-specific APIs or tools. Recent advancements have resulted in a significant paradigm shift with the integration of Large Language Models (LLMs) augmented with tool-calling into task-oriented dialogue systems and user simulators that are used for model evaluation and training. However, several issues remain that ponder the use of these systems in real applications.

In this talk, I will share our latest research towards accountability in multi-turn interactions with Large Language Models (LLMs). Our approach, called Reasoning, Acting, and Speaking (ReSpAct), has demonstrated higher task completion rates by engaging with users. To counter user over-reliance, we developed specialized accountability models for dialogue state tracking errors in task-oriented dialogue systems. We also adapted existing annotated dialogue datasets to train an LLM proficient in both interaction and tool calling, showcasing its performance on dialogue system and agentic LLM benchmarks. Our subsequent work utilized reinforcement learning for tool-based reasoning, introducing novel reward mechanisms. Finally, I will discuss user simulation for model evaluation and training. We observed that LLM-based user simulators can deviate from user goals over multi-turn interactions. To address this, we proposed a novel framework that tracks user goal progression throughout conversations, enabling the creation of user simulators that can autonomously monitor goal progression and reason to generate goal-aligned responses.