Many of my papers are available on arXiv and linked from dblp and Google Scholar.
  
    Fast attention mechanisms: a tale of parallelism
  
  
  
    In Advances in Neural Information Processing Systems 38, 2025.
  
  
    [ external link | bibtex ]
  
  
    Survey on algorithms for multi-index models
  
  
  
    Statistical Science, 40(3):378-391, 2025.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Efficient estimation of the central mean subspace via smoothed gradient outer products
  
  
  
    SIAM Journal on Mathematics of Data Science, 7(3):1241-1264, 2025.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Learning compositional functions with transformers from easy-to-hard data
  
  
  
    In Thirty-Eighth Annual Conference on Learning Theory, 2025.
  
  
    [ external link | talk by Eshaan | bibtex ]
  
  
    Learning Gaussian multi-index models with gradient flow: time complexity and directional convergence
  
  
  
    In Twenty-Eighth International Conference on Artificial Intelligence and Statistics, 2025.
  
  
    [ external link | bibtex ]
  
  
    The piranha problem: large effects swimming in a small pond
  
  
  
    Notices Amer. Math. Soc. 72(1):15-25, 2025.
  
  
    [ external link | bibtex ]
  
  
    Interactive machine teaching by labeling rules and instances
  
  
  
    Transactions of the Association for Computational Linguistics, 12:1441-1459, 2024.
  
  
    [ external link | tacl link | bibtex ]
  
  
    Group-wise oracle-efficient algorithms for online multi-group learning
  
  
  
    In Advances in Neural Information Processing Systems 37, 2024.
  
  
    [ external link | bibtex ]
  
  
    One-layer transformers fail to solve the induction heads task
  
  
  
    Preprint, 2024.
  
  
    [ external link | bibtex ]
  
  
    Transformers provably learn sparse token selection while fully-connected nets cannot
  
  
  
    In Forty-First International Conference on Machine Learning, 2024.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Transformers, parallel computation, and logarithmic depth
  
  
  
    In Forty-First International Conference on Machine Learning, 2024.
  
  
    [ external link | talk slides | pmlr link | bibtex ]
  
  
    Multi-group learning for hierarchical groups
  
  
  
    In Forty-First International Conference on Machine Learning, 2024.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    On the sample complexity of parameter estimation in logistic regression with normal design
  
  
  
    In Thirty-Seventh Annual Conference on Learning Theory, 2024.
  
  
    [ external link | talk slides | pmlr link | talk by Arya | bibtex ]
  
  
    Distribution-specific auditing for subgroup fairness
  
  
  
    In Fifth Symposium on Foundations of Responsible Computing, 2024.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Statistical-computational trade-offs in tensor PCA and related problems via communication complexity
  
  
  
    The Annals of Statistics, 52(1):131-156, 2024.
  
  
    [ local pdf file | external link | talk slides | aos link | bibtex ]
  
  
    Representational strengths and limitations of transformers
  
  
  
    In Advances in Neural Information Processing Systems 36, 2023.
  
  
    [ external link | talk slides | bibtex ]
  
  
    Intrinsic dimensionality and generalization properties of the \(\mathcal{R}\)-norm inductive bias
  
  
  
    In Thirty-Sixth Annual Conference on Learning Theory, 2023.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Masked prediction: a parameter identifiability view
  
  
  
    In Advances in Neural Information Processing Systems 35, 2022.
  
  
    [ external link | bibtex ]
  
  
    Unbiased estimators for random design regression
  
  
  
    Journal of Machine Learning Research, 23(167):1-46, 2022.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Near-optimal statistical query lower bounds for agnostically learning intersections of halfspaces with Gaussian marginals
  
  
  
    In Thirty-Fifth Annual Conference on Learning Theory, 2022.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Learning tensor representations for meta-learning
  
  
  
    In Twenty-Fifth International Conference on Artificial Intelligence and Statistics, 2022.
  
  
    [ external link | bibtex ]
  
  
    Simple and near-optimal algorithms for hidden stratification and multi-group learning
  
  
  
    In Thirty-Ninth International Conference on Machine Learning, 2022.
  
  
    [ external link | talk slides | errata | pmlr link | bibtex ]
  
  
    Contrastive estimation reveals topic posterior information to linear models
  
  
  
    Journal of Machine Learning Research, 22(281):1-31, 2021.
  
  
    [ external link | talk slides | bibtex ]
  
  
    Bayesian decision-making under misspecified priors with applications to meta-learning
  
  
  
    In Advances in Neural Information Processing Systems 34, 2021.
  
  
    [ external link | bibtex ]
  
  
    Support vector machines and linear regression coincide with very high-dimensional features
  
  
  
    In Advances in Neural Information Processing Systems 34, 2021.
  
  
    [ external link | bibtex ]
  
  
    Classification vs regression in overparameterized regimes: Does the loss function matter?
  
  
  
    Journal of Machine Learning Research, 22(222):1-69, 2021.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    On the approximation power of two-layer networks of random ReLUs
  
  
  
    In Thirty-Fourth Annual Conference on Learning Theory, 2021.
  
  
    [ external link | talk slides | note about kernels | pmlr link | blog post | bibtex ]
  
  
    Statistical query lower bounds for tensor PCA
  
  
  
    Journal of Machine Learning Research, 22(83):1-51, 2021.
  
  
    [ external link | jmlr link | bibtex ]
  
  
    Generalization bounds via distillation
  
  
  
    In Ninth International Conference on Learning Representations, 2021.
  
  
    [ external link | bibtex ]
  
  
    On the proliferation of support vectors in high dimensions
  
  
  
    In Twenty-Fourth International Conference on Artificial Intelligence and Statistics, 2021.
  
  
    [ external link | reviews | response | pmlr link | bibtex ]
  
  
    Contrastive learning, multi-view redundancy, and linear models
  
  
  
    In Thirty-Second International Conference on Algorithmic Learning Theory, 2021.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Cross-lingual text classification with minimal resources by transferring a sparse teacher
  
  
  
    In Conference on Empirical Methods in Natural Language Processing: Findings, 2020.
  
  
    [ external link | aclweb link | bibtex ]
  
  
    Interpreting deep learning models for weak lensing
  
  
  
    Phys. Rev. D, 102:123506, Dec 2020.
  
  
    [ external link | aps link | bibtex ]
  
  
    Ensuring fairness beyond the training data
  
  
  
    In Advances in Neural Information Processing Systems 33, 2020.
  
  
    [ external link | bibtex ]
  
  
    Diameter-based interactive structure discovery
  
  
  
    In Twenty-Third International Conference on Artificial Intelligence and Statistics, 2020.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Two models of double descent for weak features
  
  
  
    SIAM Journal on Mathematics of Data Science, 2(4):1167–1180, 2020.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Kernel approximation methods for speech recognition
  
  
  
    Journal of Machine Learning Research, 20(59):1-36, 2019.
  
  
    [ external link | bibtex ]
  
  
    On the number of variables to use in principal component regression
  
  
  
    In Advances in Neural Information Processing Systems 32, 2019.
  
  
    [ external link | bibtex ]
  
  
    Leveraging just a few keywords for fine-grained aspect detection through weakly supervised co-training
  
  
  
    In Conference on Empirical Methods in Natural Language Processing, 2019.
  
  
    [ external link | talk slides | bibtex ]
  
  
    Privacy accounting and quality control in the Sage differentially private ML platform
  
  
  
    In Twenty-Seventh ACM Symposium on Operating Systems Principles, 2019.
  
  
    [ external link | bibtex ]
  
  
    Weak lensing cosmology with convolutional neural networks on noisy data
  
  
  
    Monthly Notices of the Royal Astronomical Society, 490(2):1843–1860, 2019.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Reconciling modern machine learning practice and the bias-variance trade-off
  
  
  
    Proceedings of the National Academy of Sciences, 116(32):15849-15854, 2019.
  
  
    [ local pdf file | pnas link | arxiv link | bibtex ]
  
  
    Mixing time estimation in reversible Markov chains from a single sample path
  
  
  
    The Annals of Applied Probability, 29(4):2439–2480, 2019.
  
  
    [ local pdf file | aap link | bibtex ]
  
  
    Using a machine learning approach to determine the space group of a structure from the atomic pair distribution function
  
  
  
    Acta Crystallographica Section A, 75(4):633–643, 2019.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Teaching a black-box learner
  
  
  
    In Thirty-Sixth International Conference on Machine Learning, 2019.
  
  
    [ local pdf file | pmlr link | bibtex ]
  
  
    A gradual, semi-discrete approach to generative network training via explicit Wasserstein minimization
  
  
  
    In Thirty-Sixth International Conference on Machine Learning, 2019.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Certified robustness to adversarial examples with differential privacy
  
  
  
    In IEEE Symposium on Security and Privacy, 2019.
  
  
    [ external link | bibtex ]
  
  
    Correcting the bias in least squares regression with volume-rescaled sampling
  
  
  
    In Twenty-Second International Conference on Artificial Intelligence and Statistics, 2019.
  
  
    [ local pdf file | arxiv link | pmlr link | bibtex ]
  
  
    Attribute-efficient learning of monomials over highly-correlated variables
  
  
  
    In Thirtieth International Conference on Algorithmic Learning Theory, 2019.
  
  
    [ local pdf file | external link | bibtex ]
  
  
    Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
  
  
  
    In Advances in Neural Information Processing Systems 31, 2018.
  
  
    [ local pdf file | external link | short version | talk slides | bibtex ]
  
  
    Leveraged volume sampling for linear regression
  
  
  
    In Advances in Neural Information Processing Systems 31, 2018.
  
  
    [ local pdf file | external link | bibtex ]
  
  
    Benefits of over-parameterization with EM
  
  
  
    In Advances in Neural Information Processing Systems 31, 2018.
  
  
    [ external link | bibtex ]
  
  
    Learning single-index models in Gaussian space
  
  
  
    In Thirty-First Annual Conference on Learning Theory, 2018.
  
  
    [ local pdf file | pmlr link | bibtex ]
  
  
    Non-Gaussian information from weak lensing data via deep learning
  
  
  
    Phys. Rev. D, 97:103515, May 2018.
  
  
    [ external link | aps link | bibtex ]
  
  
    Discovering foodborne illness in online restaurant reviews
  
  
  
    Journal of the American Medical Informatics Association, 25(12):1586-1592, 2018.
  
  
    [ external link | bibtex ]
  
  
    Coding sets with asymmetric information
  
  
  
    Preprint, 2017.
  
  
    [ external link | bibtex ]
  
  
    Linear regression without correspondence
  
  
  
    In Advances in Neural Information Processing Systems 30, 2017.
  
  
    [ external link | talk slides | bibtex ]
  
  
    Subregional nowcasts of seasonal influenza using search trends
  
  
  
    Journal of Medical Internet Research, 19(11):e370, 2017.
  
  
    [ external link | bibtex ]
  
  
    Greedy approaches to symmetric orthogonal tensor decomposition
  
  
  
    SIAM Journal on Matrix Analysis and Applications, 38(4):1210-1226, 2017.
  
  
    [ local pdf file | arxiv link | siam link | bibtex ]
  
  
    Parameter identification in Markov chain choice models
  
  
  
    In Twenty-Eighth International Conference on Algorithmic Learning Theory, 2017.
  
  
    [ external link | pmlr link | bibtex ]
  
  
    Correspondence retrieval
  
  
  
    In Thirtieth Annual Conference on Learning Theory, 2017.
  
  
    [ local pdf file | talk slides | pmlr link | bibtex ]
  
  
    FairTest: discovering unwarranted associations in data-driven applications
  
  
  
    In Second IEEE European Symposium on Security and Privacy, 2017.
  
  
    [ external link | slides from privacycon | bibtex ]
  
  
    Kernel ridge vs. principal component regression: minimax bounds and the qualification of regularization operators
  
  
  
    Electronic Journal of Statistics, 1(1):1022–1047, 2017.
  
  
    [ local pdf file | ejs link | bibtex ]
  
  
    Greedy bi-criteria approximations for \(k\)-medians and \(k\)-means
  
  
  
    Preprint, 2016.
  
  
    [ external link | bibtex ]
  
  
    Search improves label for active learning
  
  
  
    In Advances in Neural Information Processing Systems 29, 2016.
  
  
    [ local pdf file | arxiv link | video advert | bibtex ]
  
  
    Global analysis of Expectation Maximization for mixtures of two Gaussians
  
  
  
    In Advances in Neural Information Processing Systems 29, 2016.
  
  
    [ local pdf file | short version | summary | arxiv link | bibtex ]
  
  
    Do dark matter halos explain lensing peaks?
  
  
  
    Phys. Rev. D, 94:083506, Oct 2016.
  
  
    [ external link | aps link | bibtex ]
  
  
    Unsupervised part-of-speech tagging with anchor hidden Markov models
  
  
  
    Transactions of the Association for Computational Linguistics, 4:245–257, 2016.
  
  
    [ external link | bibtex ]
  
  
    Compact kernel models for acoustic modeling via random feature selection
  
  
  
    In Forty-First IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.
  
  
    [ external link | bibtex ]
  
  
    Loss minimization and parameter estimation with heavy tails
  
  
  
    Journal of Machine Learning Research, 17(18):1–40, 2016.
  
  
    [ external link | slides for related talk | bibtex ]
  
  
    Mixing time estimation in reversible Markov chains from a single sample path
  
  
  
    In Advances in Neural Information Processing Systems 28, 2015.
  
  
    [ external link | talk slides | bibtex ]
  
  
    Efficient and parsimonious agnostic active learning
  
  
  
    In Advances in Neural Information Processing Systems 28, 2015.
  
  
    [ external link | bibtex ]
  
  
    Sunlight: fine-grained targeting detection at scale with statistical confidence
  
  
  
    In Twenty-Second ACM Conference on Computer and Communications Security, 2015.
  
  
    [ local pdf file | project website | bibtex ]
  
  
    Model-based word embeddings from decompositions of count matrices
  
  
  
    In Fifty-Third Annual Meeting of the Association for Computational Linguistics, 2015.
  
  
    [ local pdf file | acl link | bibtex ]
  
  
    When are overcomplete topic models identifiable?
  
  
  
    Journal of Machine Learning Research, 16(Dec):2643–2694, 2015.
  
  
    [ external link | bibtex ]
  
  
    Successive rank-one approximations for nearly orthogonally decomposable symmetric tensors
  
  
  
    SIAM Journal on Matrix Analysis and Applications, 36(4):1638–1659, 2015.
  
  
    [ external link | siam link | bibtex ]
  
  
    A spectral algorithm for latent Dirichlet allocation
  
  
  
    Algorithmica, 72(1):193–214, 2015.
  
  
    [ local pdf file | springer link | bibtex ]
  
  
    Learning sparse low-threshold linear classifiers
  
  
  
    Journal of Machine Learning Research, 16(Jul):1275–1304, 2015.
  
  
    [ external link | bibtex ]
  
  
    Scalable nonlinear learning with adaptive polynomial expansions
  
  
  
    In Advances in Neural Information Processing Systems 27, 2014.
  
  
    [ external link | bibtex ]
  
  
    The large margin mechanism for differentially private maximization
  
  
  
    In Advances in Neural Information Processing Systems 27, 2014.
  
  
    [ external link | bibtex ]
  
  
    A spectral algorithm for learning class-based \(n\)-gram models of natural language
  
  
  
    In Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014.
  
  
    [ local pdf file | auai link | code by Karl | more code by Karl | bibtex ]
  
  
    Taming the monster: a fast and simple algorithm for contextual bandits
  
  
  
    In Thirty-First International Conference on Machine Learning, 2014.
  
  
    [ local pdf file | talk slides | arxiv link | bibtex ]
  
  
    Heavy-tailed regression with a generalized median-of-means
  
  
  
    In Thirty-First International Conference on Machine Learning, 2014.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    Tensor decompositions for learning latent variable models
  
  
  
    Journal of Machine Learning Research, 15(Aug):2773–2831, 2014.
  
  
    [ local pdf file | tutorial slides | jmlr link | bibtex ]
  
  
    Random design analysis of ridge regression
  
  
  
    Foundations of Computational Mathematics, 14(3):569–600, 2014.
  
  
    [ local pdf file | springer link | arxiv link | bibtex ]
  
  
    A tensor approach to learning mixed membership community models
  
  
  
    Journal of Machine Learning Research, 15(Jun):2239–2312, 2014.
  
  
    [ external link | arxiv link | bibtex ]
  
  
    When are overcomplete topic models identifiable?
  
  
  
    In Advances in Neural Information Processing Systems 26, 2013.
  
  
    [ external link | bibtex ]
  
  
    Contrastive learning using spectral methods
  
  
  
    In Advances in Neural Information Processing Systems 26, 2013.
  
  
    [ local pdf file | bibtex ]
  
  
    A tensor spectral approach to learning mixed membership community models
  
  
  
    In Twenty-Sixth Annual Conference on Learning Theory, 2013.
  
  
    [ external link | journal version with better title | arxiv link | bibtex ]
  
  
    Learning linear Bayesian networks with latent variables
  
  
  
    In Thirtieth International Conference on Machine Learning, 2013.
  
  
    [ local pdf file | pmlr link | bibtex ]
  
  
    Learning mixtures of spherical Gaussians: moment methods and spectral decompositions
  
  
  
    In Fourth Innovations in Theoretical Computer Science, 2013.
  
  
    [ local pdf file | talk slides | arxiv link | video advert | bibtex ]
  
  
    Stochastic convex optimization with bandit feedback
  
  
  
    SIAM Journal on Optimization, 23(1):213–240, 2013.
  
  
    [ local pdf file | arxiv link | siam link | bibtex ]
  
  
    A spectral algorithm for latent Dirichlet allocation
  
  
  
    In Advances in Neural Information Processing Systems 25, 2012.
  
  
    [ external link | journal version | springer link | bibtex ]
  
  
    Learning mixtures of tree graphical models
  
  
  
    In Advances in Neural Information Processing Systems 25, 2012.
  
  
    [ external link | bibtex ]
  
  
    Identifiability and unmixing of latent parse trees
  
  
  
    In Advances in Neural Information Processing Systems 25, 2012.
  
  
    [ local pdf file | arxiv link | bibtex ]
  
  
    Random design analysis of ridge regression
  
  
  
    In Twenty-Fifth Annual Conference on Learning Theory, 2012.
  
  
    [ external link | journal version | arxiv link | bibtex ]
  
  
    A method of moments for mixture models and hidden Markov models
  
  
  
    In Twenty-Fifth Annual Conference on Learning Theory, 2012.
  
  
    [ external link | talk slides | slides for related talk | pmlr link | bibtex ]
  
  
    Convergence rates for differentially private statistical estimation
  
  
  
    In Twenty-Ninth International Conference on Machine Learning, 2012.
  
  
    [ local pdf file | bibtex ]
  
  
    Tail inequalities for sums of random matrices that depend on the intrinsic dimension
  
  
  
    Electronic Communications in Probability, 17(14):1–13, 2012.
  
  
    [ local pdf file | errata | ecp link | bibtex ]
  
  
    A spectral algorithm for learning hidden Markov models
  
  
  
    Journal of Computer and System Sciences, 78(5):1460–1480, 2012.
  
  
    [ local pdf file | errata | jcss link | arxiv link | bibtex ]
  
  
    A tail inequality for quadratic forms of subgaussian random vectors
  
  
  
    Electronic Communications in Probability, 17(52):1–6, 2012.
  
  
    [ local pdf file | ecp link | bibtex ]
  
  
    Stochastic convex optimization with bandit feedback
  
  
  
    In Advances in Neural Information Processing Systems 24, 2011.
  
  
    [ external link | journal version | siam link | bibtex ]
  
  
    Spectral methods for learning multivariate latent tree structure
  
  
  
    In Advances in Neural Information Processing Systems 24, 2011.
  
  
    [ local pdf file | arxiv link | bibtex ]
  
  
    Sample complexity bounds for differentially private learning
  
  
  
    In Twenty-Fourth Annual Conference on Learning Theory, 2011.
  
  
    [ local pdf file | pmlr link | bibtex ]
  
  
    Efficient optimal learning for contextual bandits
  
  
  
    In Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 2011.
  
  
    [ local pdf file | bibtex ]
  
  
    Robust matrix decomposition with sparse corruptions
  
  
  
    IEEE Transactions on Information Theory, 57(11):7221–7234, 2011.
  
  
    [ local pdf file | arxiv link | ieee link | bibtex ]
  
  
    Agnostic active learning without constraints
  
  
  
    In Advances in Neural Information Processing Systems 23, 2010.
  
  
    [ local pdf file | arxiv link | bibtex ]
  
  
    An online learning-based framework for tracking
  
  
  
    In Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010.
  
  
    [ external link | bibtex ]
  
  
    Algorithms for active learning
  
  
  
    Ph.D. dissertation, UC San Diego, 2010.
  
  
    [ local pdf file | bibtex ]
  
  
    A parameter-free hedging algorithm
  
  
  
    In Advances in Neural Information Processing Systems 22, 2009.
  
  
    [ local pdf file | note about \(\epsilon\)-quantile regret | bibtex ]
  
  
    Multi-label prediction via compressed sensing
  
  
  
    In Advances in Neural Information Processing Systems 22, 2009.
  
  
    [ local pdf file | talk slides | arxiv link | bibtex ]
  
  
    A spectral algorithm for learning hidden Markov models
  
  
  
    In Twenty-Second Annual Conference on Learning Theory, 2009.
  
  
    [ external link | journal version | errata | bibtex ]
  
  
    Hierarchical sampling for active learning
  
  
  
    In Twenty-Fifth International Conference on Machine Learning, 2008.
  
  
    [ local pdf file | bibtex ]
  
  
    A general agnostic active learning algorithm
  
  
  
    In Advances in Neural Information Processing Systems 20, 2007.
  
  
    [ local pdf file | video 1 | video 2 | video 3 | video 4 | bibtex ]
  
  
    On-line estimation with the multivariate Gaussian distribution
  
  
  
    In Twentieth Annual Conference on Learning Theory, 2007.
  
  
    [ local pdf file | bibtex ]
  
  
    A concentration theorem for projections
  
  
  
    In Twenty-Second Conference on Uncertainty in Artificial Intelligence, 2006.
  
  
    [ local pdf file | bibtex ]