Augustin Chaintreau, Yaniv Erlich, and Daniel Hsu have been promoted from Assistant to Associate Professor without tenure.
Augustin Chaintreau is interested in personal data and what happens to it. Who is using it and for what purposes? Is personalization giving everyone a fair chance? These are not questions Facebook, Google, or other web service companies necessarily want to answer, but large-scale, unchecked data mining of people’s personal data poses multiple risks.
To understand and mitigate these risks, Chaintreau is using mathematical analyses of networks and statistical methods to build tools and models that track how data flows, or predict how user behaviors affect them. Using those techniques, we can learn for instance why someone using online services is presented with certain ads and not others, or how a keyword (“sad”) typed in an email or an online search triggers an ad about depression or shamanic healing. Or we can prove that graph-based recommendation reinforces historical prejudice.
Connections between user actions across apps, captured and distilled over complex networks, are generally not known to people using the web. In exposing the downsides of personalization and data sharing, Chaintreau’s research often makes headlines. The New York Times, Washington Post, Economist, and Guardian have all reported on various aspects of his research, such as when he showed how geotagged data from only two apps is enough to identify someone, and that 60% of people will forward a link without reading it.
Chaintreau’s intent is not to prevent data sharing—he sees the benefits for health, energy efficiency, and public policy—but to bring transparency and accountability to the web so people can safely manage and share their personal data. Tools may inform and empower people, keep providers accountable, and sometimes present simple new ways to improve fairness.
An active member of the network and web research community, Chaintreau chaired the conference and program committees of ACM SIGMETRICS, ACM CoNEXT, and the annual meeting of the Data Transparency Lab. He served on the program committees of ACM SIGMETRICS, SIGCOMM, WWW, EC, CoNEXT, MobiCom, MobiHoc, IMC, WSDM, COSN, AAAI ICWSM, and IEEE Infocom, and is currently the founding editor in chief of the Proceedings of the ACM POMACS Series, after editing roles for IEEE TMC, ACM SIGCOMM CCR, and ACM SIGMOBILE MC2R.
For significant contributions to analyzing emerging distributed digital and social networking systems, he received the 2013 ACM SIGMETRICS Rising Star Researcher Award and he earned an NSF CAREER Award also in 2013.
Working in the field of quantitative genomics, Yaniv Erlich is at the forefront of new gene sequencing techniques and the issue of genetic privacy.
He was among the first to flag the ethical complexities of genetic privacy. A paper he spearheaded while a fellow at MIT’s Whitehead Institute showed how easy it is to take apparently anonymized genetic information donated by research participants and cross-reference it to online data sources to obtain surname information. For this, Nature Journal gave him the title Genome Hacker.
Since coming to Columbia in 2015 (through a joint appointment with the NYC Genome Center), Erlich has kept up a fast pace of new research, creating new algorithms for examining genetic information at both the molecular level and within large-scale human populations. Among his projects: software that takes only minutes to verify someone’s identity from a DNA sample; a DNA storage strategy 60% more efficient than previous methods (and approaching 90% of the theoretical maximum amount of information per nucleotide); a finding that short tandem repeats, once thought to be neutral, play an important role in regulating gene expression. In addition, he cofounded DNA.land, a crowd-sourcing site where people can donate their genome data for scientific research. Last year, he was named Chief Scientific Officer at MyHeritage, where he leads scientific development and strategy.
In his teaching too he is integrating new genetics research. His computer science class Ubiquitous Genomics was among the first courses to incorporate portable DNA sequencing devices into the curriculum, allowing students to learn DNA sequencing by doing it themselves.
Erlich’s awards include the Burroughs Wellcome Career Award (2013), the Harold M. Weintraub Award (2010), DARPA’s Young Faculty Award (2017), and the IEEE/ACM-CS HPC Award (2008). He was also recognized as one of 2010 Tomorrow’s PIs team of Genome Technology.
A central challenge in machine learning is to reliably and automatically discover hidden structure in data with minimal human intervention. Currently, however, machine learning relies heavily on humans to label large amounts of data, an expensive, time-consuming process that Daniel Hsu aims to streamline through new algorithmic approaches that also address the statistical difficulties of a problem. An interactive learning method he created as a PhD student selectively chooses a small set of data examples that it queries users to label. Applied to electrocardiograms, his method reduced the amount of training data by 90 percent.
There are both theoretical and applied aspects to Hsu’s research. As a theoretician, he has produced the first computationally efficient algorithms for several statistical estimation tasks (including many involving latent variable models such as mixture models, hidden Markov models, and topic models), provided new algorithmic frameworks for solving interactive machine learning problems, and led in the creation of scalable tools for machine learning applications.
On the application side, Hsu has contributed to methods used in web personalization, automated natural language processing, and greater transparency to how personal data is used on the web. More recently, his research helped understand the role of gene regulation in disease, infer properties of dark energy in the universe from weak gravitational lensing, and characterize complex materials within a nanoscale structure.
At Columbia since 2013, Hsu has been recognized with a Yahoo ACE Award (2014), selected as one of “AI’s 10 to Watch” in 2015 by IEEE Intelligent Systems, and received a 2016 Sloan Research Fellowship. Just last year, he was made a 2017 Kavli Fellow.
He is a prolific author (10 papers in 2017 alone) and a member of the Data Science Institute (DSI) and the DSI’s TRIPODS Institute, where he collaborates with others to advance machine learning by tying together techniques from statistics, computer science, and applied mathematics.
– Linda Crane