Artie Informational Session
10:30 AM to 11:30 AM
Schermerhorm Building
Artie is a commercial infrastructure company built to help creators get more premium projects made. Founded by former Google and YouTube executives, Artie bridges the gap between creative ambition and financial execution by translating scripts, audience engagement, and delivery data into standardized financial inputs. Their data powers investigation, compliance, and market intelligence software that has been used to solve some of the world's most high-profile criminal cases and grow consumer access to cryptocurrency safely. They are actively recruiting for full-time and internship roles. Representatives will conduct an Employer informational session and offer a presentation about their company, mission, past/upcoming projects, and future recruitment efforts, followed by a Q&A session. For more information regarding the company, please feel free to the company website: https://artie.co/.
Registration Information will be posted via email, VMock and CampusGroups.
*Event Audience: CS Graduate Students (MS/PhD) & Bridge Students
From Data to Intelligence: Shaping Information Systems from their Training Data
11:40 AM to 1:00 PM
CSB 451 CS Auditorium
Shayne Longpre, MIT
Abstract:
The capabilities, risks, and limitations of AI systems are dictated primarily by the data from which they are trained. This talk covers the lifecycle of AI data: from web-crawled origins, to refinement through filtering, mixing, and scaling laws that optimize training. We explore emerging challenges in the AI data supply chain—ethical and legal risks, methods for scalably analyzing large datasets, and techniques to optimize data for building safe and capable systems. We also analyze ecosystem-level forces shaping AI: challenges in post-deployment evaluation, red teaming, and concentration of power. Together, these topics lay the groundwork for building AI systems in a principled, scientific manner—from their data foundations to the broader dynamics of the field.
Bio:
Shayne Longpre is a PhD candidate at MIT, where his research focuses on large-scale data analysis and optimization. He leads the Data Provenance Initiative, and his research has received outstanding paper awards at ACL, NAACL, and TMLR, directly influenced the EU AI Act Code of Practice, and received press coverage by the NYT, Washington Post, Financial Times, and others.
Scalable Image AI via Self-Designing Storage
12:00 PM to 1:00 PM
CSB 453
Utku Sirin, Harvard University
Image AI is very expensive. We show that the root cause of the problem is a long-overlooked and largely unexplored dimension: storage. Storage determines how much data is moved and processed. Most images today are stored as JPEG files. JPEG is designed for the human eye. Image AI applications, however, span a wide range of domains, such as histopathology and robotics, each with very different characteristics and requirements. Using a single file format across all applications and domains results in inefficient and costly AI systems. This talk presents Image Calculator, a self-designing file format that finds the optimal storage for a given image AI task. Image Calculator achieves this by identifying design primitives for image storage and co-designing image storage with application domains. It creates a design space of thousands of candidate formats based on these design primitives, each capable of storing and representing data differently, with varying accuracy and cost trade-offs. It efficiently searches within this design space by using locality among its file formats. It exploits the inherent frequency structure in image data to efficiently serve inference and training requests. We evaluate Image Calculator across diverse datasets, tasks, models, and hardware, and show that it can generate file formats that improve accuracy by up to 8%, reduce end-to-end inference and training time by up to 14.2x, and reduce storage space by up to 8.2x compared to state-of-the-art image file formats.
Utku Sirin is a postdoctoral researcher at the Data and AI Systems Lab at Harvard University, advised by Stratos Idreos. He is interested in making AI systems efficient via vertical integration and principled design. His work on images led to the first image file format designed for AI workloads, Image Calculator, enabling a data-centric view of image AI pipelines and novel system architectures. Utku received the Microsoft Research PhD Fellowship in 2017 and the Swiss National Science Foundation Postdoctoral Fellowship in 2020 and 2023. He is also a winner of the ACM SIGMOD Student Research Competition (2017) and a recipient of distinguished reviewer awards at ICDE 2023, EDBT 2025, and VLDB 2025. Prior to Harvard, Utku earned his PhD from the Data-Intensive Applications and Systems Lab at EPFL, advised by Anastasia Ailamaki on hardware-conscious database systems. In his free time, Utku performs theatrical acting and plays classical guitar.