I work on algorithmic statistics and machine learning. My research is part of broader efforts in Foundations of Data Science, Machine Learning, and Theory of Computation at Columbia.
If you are interested in coming to Columbia to pursue education or research in these subjects, please see the CS department's admissions webpage and/or the DSI's admissions webpage for more information, as well as this list of frequent answers to questions.
Some recent things of possible interest:
- A “research vignette” on generalization and interpolation I wrote in 2019.
- Slides for a talk on analyzing contrastive learning in the context of probabilistic models, based on two recent preprints ([1], [2]) with Akshay Krishnamurthy and Chris Tosh.
Papers
Many (but not all) of my papers are available on arXiv.
Statistical query lower bounds for tensor PCA
Journal of Machine Learning Research, 22(83):1-51, 2021.
[ external link | jmlr link | bibtex ]
On the proliferation of support vectors in high dimensions
In Twenty-Fourth International Conference on Artificial Intelligence and Statistics. 2021.
[ external link | reviews | response | pmlr link | bibtex ]
Contrastive learning, multi-view redundancy, and linear models
In Thirty-Second International Conference on Algorithmic Learning Theory. 2021.
[ external link | pmlr link | bibtex ]
Cross-lingual text classification with minimal resources by transferring a sparse teacher
In Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.
[ external link | aclweb link | bibtex ]
Interpreting deep learning models for weak lensing
Phys. Rev. D, 102:123506, Dec 2020.
[ external link | aps link | bibtex ]
Ensuring fairness beyond the training data
In Advances in Neural Information Processing Systems 33. 2020.
[ external link | bibtex ]
Classification vs regression in overparameterized regimes: Does the loss function matter?
Preprint, 2020.
[ external link | bibtex ]
Contrastive estimation reveals topic posterior information to linear models
Preprint, 2020.
[ local pdf file | external link | talk slides | bibtex ]
Diameter-based interactive structure discovery
In Twenty-Third International Conference on Artificial Intelligence and Statistics. 2020.
[ external link | pmlr link | bibtex ]
Two models of double descent for weak features
SIAM Journal on Mathematics of Data Science, 2(4):1167–1180, 2020.
[ external link | arxiv link | bibtex ]
On the number of variables to use in principal component regression
In Advances in Neural Information Processing Systems 32. 2019.
[ external link | bibtex ]
Leveraging just a few keywords for fine-grained aspect detection through weakly supervised co-training
In Conference on Empirical Methods in Natural Language Processing. 2019.
[ external link | talk slides | bibtex ]
Privacy accounting and quality control in the Sage differentially private ML platform
In Twenty-Seventh ACM Symposium on Operating Systems Principles. 2019.
[ external link | bibtex ]
Weak lensing cosmology with convolutional neural networks on noisy data
Monthly Notices of the Royal Astronomical Society, 490(2):1843–1860, 2019.
[ external link | arxiv link | bibtex ]
Reconciling modern machine learning practice and the bias-variance trade-off
Proceedings of the National Academy of Sciences, 116(32):15849-15854, 2019.
[ local pdf file | pnas link | arxiv link | bibtex ]
Mixing time estimation in reversible Markov chains from a single sample path
The Annals of Applied Probability, 29(4):2439–2480, 2019.
[ local pdf file | aap link | bibtex ]
Using a machine learning approach to determine the space group of a structure from the atomic pair distribution function
Acta Crystallographica Section A, 75(4):633–643, 2019.
[ external link | arxiv | bibtex ]
Teaching a black-box learner
In Thirty-Sixth International Conference on Machine Learning. 2019.
[ local pdf file | pmlr link | bibtex ]
A gradual, semi-discrete approach to generative network training via explicit Wasserstein minimization
In Thirty-Sixth International Conference on Machine Learning. 2019.
[ external link | pmlr link | bibtex ]
Certified robustness to adversarial examples with differential privacy
In IEEE Symposium on Security and Privacy. 2019.
[ external link | bibtex ]
Correcting the bias in least squares regression with volume-rescaled sampling
In Twenty-Second International Conference on Artificial Intelligence and Statistics. 2019.
[ local pdf file | arxiv link | pmlr link | bibtex ]
Attribute-efficient learning of monomials over highly-correlated variables
In Thirtieth International Conference on Algorithmic Learning Theory. 2019.
[ local pdf file | bibtex ]
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
In Advances in Neural Information Processing Systems 31. 2018.
[ local pdf file | external link | short version | talk slides | bibtex ]
Leveraged volume sampling for linear regression
In Advances in Neural Information Processing Systems 31. 2018.
[ local pdf file | external link | bibtex ]
Benefits of over-parameterization with EM
In Advances in Neural Information Processing Systems 31. 2018.
[ external link | bibtex ]
Learning single-index models in Gaussian space
In Thirty-First Annual Conference on Learning Theory. 2018.
[ local pdf file | pmlr link | bibtex ]
Non-Gaussian information from weak lensing data via deep learning
Phys. Rev. D, 97:103515, May 2018.
[ external link | aps link | bibtex ]
Discovering foodborne illness in online restaurant reviews
Journal of the American Medical Informatics Association, 25(12):1586-1592, 2018.
[ external link | bibtex ]
Coding sets with asymmetric information
Preprint, 2017.
[ external link | bibtex ]
Linear regression without correspondence
In Advances in Neural Information Processing Systems 30. 2017.
[ external link | talk slides | bibtex ]
Subregional nowcasts of seasonal influenza using search trends
Journal of Medical Internet Research, 19(11):e370, 2017.
[ external link | bibtex ]
Greedy approaches to symmetric orthogonal tensor decomposition
SIAM Journal on Matrix Analysis and Applications, 38(4):1210-1226, 2017.
[ local pdf file | arxiv link | siam link | bibtex ]
Parameter identification in Markov chain choice models
In Twenty-Eighth International Conference on Algorithmic Learning Theory. 2017.
[ external link | pmlr link | bibtex ]
Correspondence retrieval
In Thirtieth Annual Conference on Learning Theory. 2017.
[ local pdf file | talk slides | pmlr link | bibtex ]
FairTest: discovering unwarranted associations in data-driven applications
In Second IEEE European Symposium on Security and Privacy. 2017.
[ external link | slides from privacycon | bibtex ]
Kernel ridge vs. principal component regression: minimax bounds and the qualification of regularization operators
Electronic Journal of Statistics, 1(1):1022–1047, 2017.
[ local pdf file | ejs link | bibtex ]
Greedy bi-criteria approximations for \(k\)-medians and \(k\)-means
Preprint, 2016.
[ external link | bibtex ]
Search improves label for active learning
In Advances in Neural Information Processing Systems 29. 2016.
[ local pdf file | arxiv link | video advert | bibtex ]
Global analysis of Expectation Maximization for mixtures of two Gaussians
In Advances in Neural Information Processing Systems 29. 2016.
[ local pdf file | short version | summary | arxiv link | bibtex ]
Do dark matter halos explain lensing peaks?
Phys. Rev. D, 94:083506, Oct 2016.
[ external link | aps link | bibtex ]
Unsupervised part-of-speech tagging with anchor hidden Markov models
Transactions of the Association for Computational Linguistics, 4:245–257, 2016.
[ external link | bibtex ]
Compact kernel models for acoustic modeling via random feature selection
In Forty-First IEEE International Conference on Acoustics, Speech and Signal Processing. 2016.
[ external link | bibtex ]
Loss minimization and parameter estimation with heavy tails
Journal of Machine Learning Research, 17(18):1–40, 2016.
[ external link | slides for related talk | bibtex ]
Mixing time estimation in reversible Markov chains from a single sample path
In Advances in Neural Information Processing Systems 28. 2015.
[ external link | talk slides | bibtex ]
Efficient and parsimonious agnostic active learning
In Advances in Neural Information Processing Systems 28. 2015.
[ external link | bibtex ]
Sunlight: fine-grained targeting detection at scale with statistical confidence
In Twenty-Second ACM Conference on Computer and Communications Security. 2015.
[ local pdf file | project website | bibtex ]
Model-based word embeddings from decompositions of count matrices
In Fifty-Third Annual Meeting of the Association for Computational Linguistics. 2015.
[ local pdf file | acl link | bibtex ]
When are overcomplete topic models identifiable?
Journal of Machine Learning Research, 16(Dec):2643–2694, 2015.
[ external link | bibtex ]
Successive rank-one approximations for nearly orthogonally decomposable symmetric tensors
SIAM Journal on Matrix Analysis and Applications, 36(4):1638–1659, 2015.
[ external link | siam link | bibtex ]
A spectral algorithm for latent Dirichlet allocation
Algorithmica, 72(1):193–214, 2015.
[ local pdf file | springer link | bibtex ]
Method of moments learning for left-to-right hidden Markov models
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 2015.
[ external link | bibtex ]
Learning sparse low-threshold linear classifiers
Journal of Machine Learning Research, 16(Jul):1275–1304, 2015.
[ external link | bibtex ]
Scalable nonlinear learning with adaptive polynomial expansions
In Advances in Neural Information Processing Systems 27. 2014.
[ external link | bibtex ]
The large margin mechanism for differentially private maximization
In Advances in Neural Information Processing Systems 27. 2014.
[ external link | bibtex ]
A spectral algorithm for learning class-based \(n\)-gram models of natural language
In Thirtieth Conference on Uncertainty in Artificial Intelligence. 2014.
[ local pdf file | auai link | code by Karl | more code by Karl | bibtex ]
Taming the monster: a fast and simple algorithm for contextual bandits
In Thirty-First International Conference on Machine Learning. 2014.
[ local pdf file | talk slides | arxiv link | bibtex ]
Heavy-tailed regression with a generalized median-of-means
In Thirty-First International Conference on Machine Learning. 2014.
[ external link | arxiv link | bibtex ]
Tensor decompositions for learning latent variable models
Journal of Machine Learning Research, 15(Aug):2773–2831, 2014.
[ local pdf file | tutorial slides | jmlr link | bibtex ]
Random design analysis of ridge regression
Foundations of Computational Mathematics, 14(3):569–600, 2014.
[ local pdf file | springer link | arxiv link | bibtex ]
A tensor approach to learning mixed membership community models
Journal of Machine Learning Research, 15(Jun):2239–2312, 2014.
[ external link | arxiv link | bibtex ]
When are overcomplete topic models identifiable?
In Advances in Neural Information Processing Systems 26. 2013.
[ external link | bibtex ]
Contrastive learning using spectral methods
In Advances in Neural Information Processing Systems 26. 2013.
[ local pdf file | bibtex ]
A tensor spectral approach to learning mixed membership community models
In Twenty-Sixth Annual Conference on Learning Theory. 2013.
[ external link | journal version with better title | arxiv link | bibtex ]
Learning linear Bayesian networks with latent variables
In Thirtieth International Conference on Machine Learning. 2013.
[ local pdf file | pmlr link | bibtex ]
Learning mixtures of spherical Gaussians: moment methods and spectral decompositions
In Fourth Innovations in Theoretical Computer Science. 2013.
[ local pdf file | talk slides | arxiv link | bibtex ]
Stochastic convex optimization with bandit feedback
SIAM Journal on Optimization, 23(1):213–240, 2013.
[ local pdf file | arxiv link | siam link | bibtex ]
A spectral algorithm for latent Dirichlet allocation
In Advances in Neural Information Processing Systems 25. 2012.
[ external link | journal version | springer link | bibtex ]
Learning mixtures of tree graphical models
In Advances in Neural Information Processing Systems 25. 2012.
[ external link | bibtex ]
Identifiability and unmixing of latent parse trees
In Advances in Neural Information Processing Systems 25. 2012.
[ local pdf file | arxiv link | bibtex ]
Random design analysis of ridge regression
In Twenty-Fifth Annual Conference on Learning Theory. 2012.
[ external link | journal version | arxiv link | bibtex ]
A method of moments for mixture models and hidden Markov models
In Twenty-Fifth Annual Conference on Learning Theory. 2012.
[ external link | talk slides | slides for related talk | arxiv link | bibtex ]
Convergence rates for differentially private statistical estimation
In Twenty-Ninth International Conference on Machine Learning. 2012.
[ local pdf file | bibtex ]
Tail inequalities for sums of random matrices that depend on the intrinsic dimension
Electronic Communications in Probability, 17(14):1–13, 2012.
[ local pdf file | errata | ecp link | bibtex ]
A spectral algorithm for learning hidden Markov models
Journal of Computer and System Sciences, 78(5):1460–1480, 2012.
[ local pdf file | errata | jcss link | arxiv link | bibtex ]
A tail inequality for quadratic forms of subgaussian random vectors
Electronic Communications in Probability, 17(52):1–6, 2012.
[ local pdf file | ecp link | bibtex ]
Stochastic convex optimization with bandit feedback
In Advances in Neural Information Processing Systems 24. 2011.
[ external link | journal version | siam link | bibtex ]
Spectral methods for learning multivariate latent tree structure
In Advances in Neural Information Processing Systems 24. 2011.
[ local pdf file | arxiv link | bibtex ]
Sample complexity bounds for differentially private learning
In Twenty-Fourth Annual Conference on Learning Theory. 2011.
[ local pdf file | pmlr link | bibtex ]
Efficient optimal learning for contextual bandits
In Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. 2011.
[ local pdf file | bibtex ]
Robust matrix decomposition with sparse corruptions
IEEE Transactions on Information Theory, 57(11):7221–7234, 2011.
[ local pdf file | arxiv link | ieee link | bibtex ]
Agnostic active learning without constraints
In Advances in Neural Information Processing Systems 23. 2010.
[ local pdf file | arxiv link | bibtex ]
An online learning-based framework for tracking
In Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. 2010.
[ external link | bibtex ]
Algorithms for active learning
Ph.D. dissertation, UC San Diego. 2010.
[ local pdf file | bibtex ]
A parameter-free hedging algorithm
In Advances in Neural Information Processing Systems 22. 2009.
[ local pdf file | note about \(\epsilon\)-quantile regret | bibtex ]
Multi-label prediction via compressed sensing
In Advances in Neural Information Processing Systems 22. 2009.
[ local pdf file | talk slides | arxiv link | bibtex ]
A spectral algorithm for learning hidden Markov models
In Twenty-Second Annual Conference on Learning Theory. 2009.
[ external link | journal version | errata | bibtex ]
Hierarchical sampling for active learning
In Twenty-Fifth International Conference on Machine Learning. 2008.
[ local pdf file | bibtex ]
A general agnostic active learning algorithm
In Advances in Neural Information Processing Systems 20. 2007.
[ local pdf file | video 1 | video 2 | video 3 | video 4 | bibtex ]
On-line estimation with the multivariate Gaussian distribution
In Twentieth Annual Conference on Learning Theory. 2007.
[ local pdf file | bibtex ]
A concentration theorem for projections
In Twenty-Second Conference on Uncertainty in Artificial Intelligence. 2006.
[ local pdf file | bibtex ]
People
I’m lucky to be able to work with several outstanding students and postdocs.
- Students
- Jagdeep Bhatia (High school diploma, 2020; → MIT)
- Rishabh Dudeja (Ph.D. in progress)
- Arushi Gupta (B.S., 2016; M.S., 2018; → Princeton)
- Giannis Karamanolakis (Ph.D. in progress)
- Achille Nazaret (Ph.D. in progress)
- Clayton Sanford (Ph.D. in progress)
- Kevin Shi (Ph.D., 2020; → Facebook)
- Geelon So (M.S., 2019; → UC San Diego)
- Kiran Vodrahalli (Ph.D. in progress)
- Ji Xu (Ph.D., 2020; → Two Sigma)
- Keyang Xu (Ph.D. in progress)
- Postdocs
I’ve also worked with other fantastic students on their thesis research, including: Mathias Lécuyer, Avner May, Cun Mu, Karl Stratos, José Manuel Zorrilla Matilla.
Service
- Associate editor
- Board of directors
- Association for Computational Learning (2017-2020)
- Program chair
- Conference on Learning Theory 2019 (with Alina Beygelzimer)
- Senior program committee / area chair
- Conference on Learning Theory 2011, 2013, 2015, 2016, 2017, 2018, 2020, 2021
- International Conference on Machine Learning 2012, 2013, 2015, 2016, 2017
- Conference on Neural Information Processing Systems 2012, 2013, 2017, 2019
- International Conference on Artificial Intelligence and Statistics 2016, 2017, 2019
- Conference on Algorithmic Learning Theory 2017, 2018, 2019, 2021
- Local organization
- Conference on Learning Theory 2016 (with Satyen Kale)
- Conference on Learning Theory 2020 (with Peter Grünwald, Benjamin Guedj, Satyen Kale, and Wouter Koolen)
- Workshop and seminar organization
- Columbia Year of Statistical Machine Learning (Fall 2019-Spring 2020)
- Columbia DSI/TRIPODS Deep Learning Workshop (March 15, 2019)
- Columbia Foundations of Data Science Seminar (Fall 2015)
- ICML 2014 Method of Moments and Spectral Learning (June 25, 2014)
- DIMACS/CCICADA Systems and Analytics of Big Data (March 17-18, 2014)
- NeurIPS 2013 Spectral Learning (December 10, 2013)
- ICML 2013 Spectral Learning (June 21, 2013)
Teaching
- Courses at Columbia
- Machine Learning (Most recently: COMS 4771 Fall 2020)
- Machine Learning Theory (Most recently: COMS 4995 Spring 2020)
- Topics in Learning Theory (Most recently: COMS 6998 Fall 2017)
- Unsupervised Learning (Most recently: COMS 4774 Spring 2021)
- Tutorials
Thanks
I am grateful for support provided by the National Science Foundation, the National Aeronautics and Space Administration, the Alfred P. Sloan Foundation, the Columbia Data Science Institute, Bloomberg, Google, JP Morgan, NVIDIA, Two Sigma, and Yahoo.
- TRIPODS: From Foundations to Practice of Data Science and Back [website]
- NSF IIS: Adaptive Information Extraction from Social Media [website]
- NSF DMREF: Deblurring our View of Atomic Arrangements in Complex Materials
- Sloan Research Fellowship
- Columbia DSI Data Science Interdisciplinary ROADS Grant
- Bloomberg Data Science Research Grant
- Google Faculty Research Award
- JP Morgan Faculty Award
- Two Sigma Research Gift
- Yahoo Faculty and Research Engagement Program Award