I work on algorithmic statistics and machine learning. My research is part of broader efforts in Foundations of Data Science, Machine Learning, and Theory of Computation at Columbia.
If you are a (current or prospective) student interested in coming to Columbia and/or working with me on research, or if you are generally interested in getting started in machine learning and/or research, please check this page of frequent answers to questions.
Scholars: Consider submitting a paper to TheoretiCS, a new open access journal for theoretical computer science.
Papers
Many of my papers are available on arXiv.
Statistical-computational trade-offs in tensor PCA and related problems via communication complexity
Preprint, 2022.
[ external link | bibtex ]
Near-optimal statistical query lower bounds for agnostically learning intersections of halfspaces with Gaussian marginals
Preprint, 2022.
[ external link | bibtex ]
Masked prediction tasks: a parameter identifiability view
Preprint, 2022.
[ external link | bibtex ]
Learning tensor representations for meta-learning
In Twenty-Fifth International Conference on Artificial Intelligence and Statistics, 2022.
[ external link | bibtex ]
Simple and near-optimal algorithms for hidden stratification and multi-group learning
Preprint, 2021.
[ external link | bibtex ]
Contrastive estimation reveals topic posterior information to linear models
Journal of Machine Learning Research, 22(281):1-31, 2021.
[ external link | talk slides | bibtex ]
Bayesian decision-making under misspecified priors with applications to meta-learning
In Advances in Neural Information Processing Systems 34, 2021.
[ external link | bibtex ]
Support vector machines and linear regression coincide with very high-dimensional features
In Advances in Neural Information Processing Systems 34, 2021.
[ external link | bibtex ]
Classification vs regression in overparameterized regimes: Does the loss function matter?
Journal of Machine Learning Research, 22(222):1-69, 2021.
[ external link | arxiv link | bibtex ]
On the approximation power of two-layer networks of random ReLUs
In Thirty-Fourth Annual Conference on Learning Theory, 2021.
[ external link | talk slides | note about kernels | pmlr link | blog post | bibtex ]
Statistical query lower bounds for tensor PCA
Journal of Machine Learning Research, 22(83):1-51, 2021.
[ external link | jmlr link | bibtex ]
Generalization bounds via distillation
In Ninth International Conference on Learning Representations, 2021.
[ external link | bibtex ]
On the proliferation of support vectors in high dimensions
In Twenty-Fourth International Conference on Artificial Intelligence and Statistics, 2021.
[ external link | reviews | response | pmlr link | bibtex ]
Contrastive learning, multi-view redundancy, and linear models
In Thirty-Second International Conference on Algorithmic Learning Theory, 2021.
[ external link | pmlr link | bibtex ]
Cross-lingual text classification with minimal resources by transferring a sparse teacher
In Conference on Empirical Methods in Natural Language Processing: Findings, 2020.
[ external link | aclweb link | bibtex ]
Interpreting deep learning models for weak lensing
Phys. Rev. D, 102:123506, Dec 2020.
[ external link | aps link | bibtex ]
Ensuring fairness beyond the training data
In Advances in Neural Information Processing Systems 33, 2020.
[ external link | bibtex ]
Diameter-based interactive structure discovery
In Twenty-Third International Conference on Artificial Intelligence and Statistics, 2020.
[ external link | pmlr link | bibtex ]
Two models of double descent for weak features
SIAM Journal on Mathematics of Data Science, 2(4):1167–1180, 2020.
[ external link | arxiv link | bibtex ]
Unbiased estimators for random design regression
Preprint, 2019.
[ external link | bibtex ]
On the number of variables to use in principal component regression
In Advances in Neural Information Processing Systems 32, 2019.
[ external link | bibtex ]
Leveraging just a few keywords for fine-grained aspect detection through weakly supervised co-training
In Conference on Empirical Methods in Natural Language Processing, 2019.
[ external link | talk slides | bibtex ]
Privacy accounting and quality control in the Sage differentially private ML platform
In Twenty-Seventh ACM Symposium on Operating Systems Principles, 2019.
[ external link | bibtex ]
Weak lensing cosmology with convolutional neural networks on noisy data
Monthly Notices of the Royal Astronomical Society, 490(2):1843–1860, 2019.
[ external link | arxiv link | bibtex ]
Reconciling modern machine learning practice and the bias-variance trade-off
Proceedings of the National Academy of Sciences, 116(32):15849-15854, 2019.
[ local pdf file | pnas link | arxiv link | bibtex ]
Mixing time estimation in reversible Markov chains from a single sample path
The Annals of Applied Probability, 29(4):2439–2480, 2019.
[ local pdf file | aap link | bibtex ]
Using a machine learning approach to determine the space group of a structure from the atomic pair distribution function
Acta Crystallographica Section A, 75(4):633–643, 2019.
[ external link | arxiv | bibtex ]
Teaching a black-box learner
In Thirty-Sixth International Conference on Machine Learning, 2019.
[ local pdf file | pmlr link | bibtex ]
A gradual, semi-discrete approach to generative network training via explicit Wasserstein minimization
In Thirty-Sixth International Conference on Machine Learning, 2019.
[ external link | pmlr link | bibtex ]
Certified robustness to adversarial examples with differential privacy
In IEEE Symposium on Security and Privacy, 2019.
[ external link | bibtex ]
Correcting the bias in least squares regression with volume-rescaled sampling
In Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019.
[ local pdf file | arxiv link | pmlr link | bibtex ]
Attribute-efficient learning of monomials over highly-correlated variables
In Thirtieth International Conference on Algorithmic Learning Theory, 2019.
[ local pdf file | bibtex ]
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
In Advances in Neural Information Processing Systems 31, 2018.
[ local pdf file | external link | short version | talk slides | bibtex ]
Leveraged volume sampling for linear regression
In Advances in Neural Information Processing Systems 31, 2018.
[ local pdf file | external link | bibtex ]
Benefits of over-parameterization with EM
In Advances in Neural Information Processing Systems 31, 2018.
[ external link | bibtex ]
Learning single-index models in Gaussian space
In Thirty-First Annual Conference on Learning Theory, 2018.
[ local pdf file | pmlr link | bibtex ]
Non-Gaussian information from weak lensing data via deep learning
Phys. Rev. D, 97:103515, May 2018.
[ external link | aps link | bibtex ]
Discovering foodborne illness in online restaurant reviews
Journal of the American Medical Informatics Association, 25(12):1586-1592, 2018.
[ external link | bibtex ]
Coding sets with asymmetric information
Preprint, 2017.
[ external link | bibtex ]
Linear regression without correspondence
In Advances in Neural Information Processing Systems 30, 2017.
[ external link | talk slides | bibtex ]
Subregional nowcasts of seasonal influenza using search trends
Journal of Medical Internet Research, 19(11):e370, 2017.
[ external link | bibtex ]
Greedy approaches to symmetric orthogonal tensor decomposition
SIAM Journal on Matrix Analysis and Applications, 38(4):1210-1226, 2017.
[ local pdf file | arxiv link | siam link | bibtex ]
Parameter identification in Markov chain choice models
In Twenty-Eighth International Conference on Algorithmic Learning Theory, 2017.
[ external link | pmlr link | bibtex ]
Correspondence retrieval
In Thirtieth Annual Conference on Learning Theory, 2017.
[ local pdf file | talk slides | pmlr link | bibtex ]
FairTest: discovering unwarranted associations in data-driven applications
In Second IEEE European Symposium on Security and Privacy, 2017.
[ external link | slides from privacycon | bibtex ]
Kernel ridge vs. principal component regression: minimax bounds and the qualification of regularization operators
Electronic Journal of Statistics, 1(1):1022–1047, 2017.
[ local pdf file | ejs link | bibtex ]
Greedy bi-criteria approximations for \(k\)-medians and \(k\)-means
Preprint, 2016.
[ external link | bibtex ]
Search improves label for active learning
In Advances in Neural Information Processing Systems 29, 2016.
[ local pdf file | arxiv link | video advert | bibtex ]
Global analysis of Expectation Maximization for mixtures of two Gaussians
In Advances in Neural Information Processing Systems 29, 2016.
[ local pdf file | short version | summary | arxiv link | bibtex ]
Do dark matter halos explain lensing peaks?
Phys. Rev. D, 94:083506, Oct 2016.
[ external link | aps link | bibtex ]
Unsupervised part-of-speech tagging with anchor hidden Markov models
Transactions of the Association for Computational Linguistics, 4:245–257, 2016.
[ external link | bibtex ]
Compact kernel models for acoustic modeling via random feature selection
In Forty-First IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.
[ external link | bibtex ]
Loss minimization and parameter estimation with heavy tails
Journal of Machine Learning Research, 17(18):1–40, 2016.
[ external link | slides for related talk | bibtex ]
Mixing time estimation in reversible Markov chains from a single sample path
In Advances in Neural Information Processing Systems 28, 2015.
[ external link | talk slides | bibtex ]
Efficient and parsimonious agnostic active learning
In Advances in Neural Information Processing Systems 28, 2015.
[ external link | bibtex ]
Sunlight: fine-grained targeting detection at scale with statistical confidence
In Twenty-Second ACM Conference on Computer and Communications Security, 2015.
[ local pdf file | project website | bibtex ]
Model-based word embeddings from decompositions of count matrices
In Fifty-Third Annual Meeting of the Association for Computational Linguistics, 2015.
[ local pdf file | acl link | bibtex ]
When are overcomplete topic models identifiable?
Journal of Machine Learning Research, 16(Dec):2643–2694, 2015.
[ external link | bibtex ]
Successive rank-one approximations for nearly orthogonally decomposable symmetric tensors
SIAM Journal on Matrix Analysis and Applications, 36(4):1638–1659, 2015.
[ external link | siam link | bibtex ]
A spectral algorithm for latent Dirichlet allocation
Algorithmica, 72(1):193–214, 2015.
[ local pdf file | springer link | bibtex ]
Learning sparse low-threshold linear classifiers
Journal of Machine Learning Research, 16(Jul):1275–1304, 2015.
[ external link | bibtex ]
Scalable nonlinear learning with adaptive polynomial expansions
In Advances in Neural Information Processing Systems 27, 2014.
[ external link | bibtex ]
The large margin mechanism for differentially private maximization
In Advances in Neural Information Processing Systems 27, 2014.
[ external link | bibtex ]
A spectral algorithm for learning class-based \(n\)-gram models of natural language
In Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014.
[ local pdf file | auai link | code by Karl | more code by Karl | bibtex ]
Taming the monster: a fast and simple algorithm for contextual bandits
In Thirty-First International Conference on Machine Learning, 2014.
[ local pdf file | talk slides | arxiv link | bibtex ]
Heavy-tailed regression with a generalized median-of-means
In Thirty-First International Conference on Machine Learning, 2014.
[ external link | arxiv link | bibtex ]
Tensor decompositions for learning latent variable models
Journal of Machine Learning Research, 15(Aug):2773–2831, 2014.
[ local pdf file | tutorial slides | jmlr link | bibtex ]
Random design analysis of ridge regression
Foundations of Computational Mathematics, 14(3):569–600, 2014.
[ local pdf file | springer link | arxiv link | bibtex ]
A tensor approach to learning mixed membership community models
Journal of Machine Learning Research, 15(Jun):2239–2312, 2014.
[ external link | arxiv link | bibtex ]
When are overcomplete topic models identifiable?
In Advances in Neural Information Processing Systems 26, 2013.
[ external link | bibtex ]
Contrastive learning using spectral methods
In Advances in Neural Information Processing Systems 26, 2013.
[ local pdf file | bibtex ]
A tensor spectral approach to learning mixed membership community models
In Twenty-Sixth Annual Conference on Learning Theory, 2013.
[ external link | journal version with better title | arxiv link | bibtex ]
Learning linear Bayesian networks with latent variables
In Thirtieth International Conference on Machine Learning, 2013.
[ local pdf file | pmlr link | bibtex ]
Learning mixtures of spherical Gaussians: moment methods and spectral decompositions
In Fourth Innovations in Theoretical Computer Science, 2013.
[ local pdf file | talk slides | arxiv link | video advert | bibtex ]
Stochastic convex optimization with bandit feedback
SIAM Journal on Optimization, 23(1):213–240, 2013.
[ local pdf file | arxiv link | siam link | bibtex ]
A spectral algorithm for latent Dirichlet allocation
In Advances in Neural Information Processing Systems 25, 2012.
[ external link | journal version | springer link | bibtex ]
Learning mixtures of tree graphical models
In Advances in Neural Information Processing Systems 25, 2012.
[ external link | bibtex ]
Identifiability and unmixing of latent parse trees
In Advances in Neural Information Processing Systems 25, 2012.
[ local pdf file | arxiv link | bibtex ]
Random design analysis of ridge regression
In Twenty-Fifth Annual Conference on Learning Theory, 2012.
[ external link | journal version | arxiv link | bibtex ]
A method of moments for mixture models and hidden Markov models
In Twenty-Fifth Annual Conference on Learning Theory, 2012.
[ external link | talk slides | slides for related talk | pmlr link | bibtex ]
Convergence rates for differentially private statistical estimation
In Twenty-Ninth International Conference on Machine Learning, 2012.
[ local pdf file | bibtex ]
Tail inequalities for sums of random matrices that depend on the intrinsic dimension
Electronic Communications in Probability, 17(14):1–13, 2012.
[ local pdf file | errata | ecp link | bibtex ]
A spectral algorithm for learning hidden Markov models
Journal of Computer and System Sciences, 78(5):1460–1480, 2012.
[ local pdf file | errata | jcss link | arxiv link | bibtex ]
A tail inequality for quadratic forms of subgaussian random vectors
Electronic Communications in Probability, 17(52):1–6, 2012.
[ local pdf file | ecp link | bibtex ]
Stochastic convex optimization with bandit feedback
In Advances in Neural Information Processing Systems 24, 2011.
[ external link | journal version | siam link | bibtex ]
Spectral methods for learning multivariate latent tree structure
In Advances in Neural Information Processing Systems 24, 2011.
[ local pdf file | arxiv link | bibtex ]
Sample complexity bounds for differentially private learning
In Twenty-Fourth Annual Conference on Learning Theory, 2011.
[ local pdf file | pmlr link | bibtex ]
Efficient optimal learning for contextual bandits
In Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 2011.
[ local pdf file | bibtex ]
Robust matrix decomposition with sparse corruptions
IEEE Transactions on Information Theory, 57(11):7221–7234, 2011.
[ local pdf file | arxiv link | ieee link | bibtex ]
Agnostic active learning without constraints
In Advances in Neural Information Processing Systems 23, 2010.
[ local pdf file | arxiv link | bibtex ]
An online learning-based framework for tracking
In Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010.
[ external link | bibtex ]
Algorithms for active learning
Ph.D. dissertation, UC San Diego, 2010.
[ local pdf file | bibtex ]
A parameter-free hedging algorithm
In Advances in Neural Information Processing Systems 22, 2009.
[ local pdf file | note about \(\epsilon\)-quantile regret | bibtex ]
Multi-label prediction via compressed sensing
In Advances in Neural Information Processing Systems 22, 2009.
[ local pdf file | talk slides | arxiv link | bibtex ]
A spectral algorithm for learning hidden Markov models
In Twenty-Second Annual Conference on Learning Theory, 2009.
[ external link | journal version | errata | bibtex ]
Hierarchical sampling for active learning
In Twenty-Fifth International Conference on Machine Learning, 2008.
[ local pdf file | bibtex ]
A general agnostic active learning algorithm
In Advances in Neural Information Processing Systems 20, 2007.
[ local pdf file | video 1 | video 2 | video 3 | video 4 | bibtex ]
On-line estimation with the multivariate Gaussian distribution
In Twentieth Annual Conference on Learning Theory, 2007.
[ local pdf file | bibtex ]
A concentration theorem for projections
In Twenty-Second Conference on Uncertainty in Artificial Intelligence, 2006.
[ local pdf file | bibtex ]
People
I’m lucky to be able to work with several outstanding students and postdocs.
- Students
- Navid Ardeshir (Ph.D. in progress)
- Jagdeep Bhatia (High school diploma, 2020; → MIT)
- Samuel Deng (M.S., 2021; Ph.D. in progress)
- Rishabh Dudeja (Ph.D., 2021; → Harvard)
- Arushi Gupta (B.S., 2016; M.S., 2018; → Princeton)
- Giannis Karamanolakis (Ph.D. in progress)
- Clayton Sanford (Ph.D. in progress)
- Kevin Shi (Ph.D., 2020; → Facebook)
- Abhinava Sikdar (M.S. in progress)
- Geelon So (M.S., 2019; → UC San Diego)
- Kiran Vodrahalli (Ph.D. in progress)
- Ji Xu (Ph.D., 2020; → Two Sigma)
- Keyang Xu (Ph.D. in progress)
- Mingyue Xu (M.S. in progress)
- Postdocs
- Debmalya Mandal (→ Max Planck Institute for Software Systems)
- Christopher Tosh (→ Memorial Sloan-Kettering Cancer Center)
I’ve also worked with other fantastic students on their thesis research, including: Mathias Lécuyer, Avner May, Cun Mu, Karl Stratos, José Manuel Zorrilla Matilla.
Service
- Associate editor
- Action editor
- Program chair
- Conference on Learning Theory 2019 (with Alina Beygelzimer)
- Senior program committee / area chair
- Conference on Learning Theory 2011, 2013, 2015, 2016, 2017, 2018, 2020, 2021, 2022
- International Conference on Machine Learning 2012, 2013, 2015, 2016, 2017
- Conference on Neural Information Processing Systems 2012, 2013, 2017, 2019, 2021
- International Conference on Artificial Intelligence and Statistics 2016, 2017, 2019
- Conference on Algorithmic Learning Theory 2017, 2018, 2019, 2021, 2022
- Workshop and seminar organization
- FOCS 2021 Workshop on Machine Learning (February 7-8, 2022)
- Columbia Year of Statistical Machine Learning (Fall 2019-Spring 2020)
- Columbia DSI/TRIPODS Deep Learning Workshop (March 15, 2019)
- Columbia Foundations of Data Science Seminar (Fall 2015)
- ICML 2014 Method of Moments and Spectral Learning (June 25, 2014)
- DIMACS/CCICADA Systems and Analytics of Big Data (March 17-18, 2014)
- NeurIPS 2013 Spectral Learning (December 10, 2013)
- ICML 2013 Spectral Learning (June 21, 2013)
Teaching
- Courses at Columbia
- Computational Linear Algebra (Planned: COMS 3251 Fall 2022)
- Machine Learning (Most recently: COMS 4721 Spring 2022)
- Machine Learning Theory (Most recently: COMS 4995 Spring 2020)
- Topics in Learning Theory (Most recently: COMS 6998 Fall 2017)
- Unsupervised Learning (Most recently: COMS 4774 Spring 2021)
- Tutorials
Thanks
I am grateful for support provided by the National Science Foundation, the National Aeronautics and Space Administration, the Alfred P. Sloan Foundation, the Columbia Data Science Institute, Bloomberg, Google, JP Morgan, NVIDIA, Two Sigma, and Yahoo.
- NSF IIS: Towards Causal Fair Decision-making
- TRIPODS: From Foundations to Practice of Data Science and Back [website]
- NSF IIS: Adaptive Information Extraction from Social Media [website]
- NSF DMREF: Deblurring our View of Atomic Arrangements in Complex Materials
- Sloan Research Fellowship
- Columbia DSI Data Science Interdisciplinary ROADS Grant
- Bloomberg Data Science Research Grant
- Google Faculty Research Award
- JP Morgan Faculty Award
- Two Sigma Research Gift
- Yahoo Faculty and Research Engagement Program Award