Eugene Wu
Assistant Professor
421 Mudd in the DSI space, 500 W 120th St
Database Group
Computer Science
Columbia University in the City of New York

email @sirrice github


We are currently hiring great graduate and post-doctoral researchers.


Eugene Wu is broadly interested in technologies that help users play with their data. His goal is for users at all technical levels to effectively and quickly make sense of their information. He is interested in solutions that ultimately improve the interface between users and data, and techniques borrows from fields such as data management, systems, crowd sourcing, visualization, and HCI.

Eugene Wu recieved his Ph.D. from CSAIL at MIT, advised by the esteemed Sam Madden and Michael Stonebraker, in the database group. He spent the first half of 2015 at UC Berkeley before starting at Columbia University in Fall 2015. Formal and less formal (by @mstem) biographies. An obituary.

He is supported by NSF 1527765.

Recent News

  • Talk about what provenance is and how it relates to our projects. MIT BigData Workshop 2016
  • Congrats to Niranjan, Arnab, Yifan, Daniel Haas, Sanjay, and Daniel Alabi for getting four papers accepted at SIGMOD's Hilda workshop!
  • QueryFix explanation demo with Xiaolan and Alexandra accepted to SIGMOD 2016!
  • CLAMShell paper with Daniel Haas for drastically speeding up crowds accepted to VLDB 2016!

Current Research Areas

Data Visualization Management Systems

A Data Visualization Management System (DVMS) integrates visualizations and databases, by compiling a declarative visualization language into an end-to-end relational operator pipeline that renders the visualization and is amenable to database-style optimizations. Thus the DVMS can be both expressive via the visualization language, and performant by leveraging traditional and visualization-specific optimizations to scale interactive visualizations to massive datasets.

Query Explanation

Instead of explaining and fixing data using data, which is a bit circuitous, we seek to both explain and repair incorrect data values by using the actual queries that modified the database.

Data Exploration and Explanation

Visualizations are excellent for exposing surprising patterns and outliers in data, however existing tools have no way to help explain those patterns and outliers. We are exploring systems to generate sensible explanations for outliers in analytics visualizations.

Data Cleaning Systems

Analysts report spending upwards of 80% of their time on problems in data cleaning including extraction, formatting, handling missing values, and entity resolution. How can knowing the application you want to actually run help speed up the cleaning process?


I am lucky to be working, and have worked, with many remarkable students.


  • Fotis Psallidas
  • Xiaolan Wang (UMaas Amherst, advised by Alexandra Meliou)
    • QFix: explaining database errors using query histories
  • Yifan Wu (UC Berkeley, Joe Hellerstein)
    • Consistency in Declarative Visual Interactive Languages (DeVIL)
  • Sanjay Krishnan (UC Berkeley, advised by Michael Franklin, Ken Goldberg)
    • Data cleaning and machine learning
  • Daniel Haas (UC Berkeley, advised by Michael Franklin)
    • Making crowds fast
  • Lilong Jiang (Ohio State, advised by Arnab Nandi)
    • Human graphical perception


  • Daniel Alabi (starting PhD at Harvard)
    • Using human perceptual models to make visualizations faster
  • Zhengjie Miao
    • Predictiong user interactions to make visualizations fasters and better


Selected Publications

Powered by cloudstitch. clonethis



Some Interesting Links