Below is a list of my papers. $(\alpha\text{-}\beta)$ denotes alphabetical ordering of co-authors.

Publications & Preprints

  1. Sayak Chakrabarti, Josh Alman, Toniann Pitassi.
    Poly-attention: a general scheme for higher-order self-attention
    Submitted, 2025.
    Abstract

    The self-attention mechanism, at the heart of the Transformer model, is able to effectively model pairwise interactions between tokens. However, numerous recent works have shown that it is unable to perform basic tasks involving detecting triples of correlated tokens, or compositional tasks where multiple input tokens need to be referenced to generate a result. Some higher-dimensional alternatives to self-attention have been proposed to address this, including higher-order attention (Sanford et al., 2023) and Strassen attention (Kozachinskiy et al., 2025), which can perform some of these polyadic tasks in exchange for slower, superquadratic running times.

    In this work, we define a vast class of generalizations of self-attention, which we call poly-attention mechanisms. Our mechanisms can incorporate arbitrary higher-order (tensor) computations as well as arbitrary relationship structures between the input tokens, and they include the aforementioned alternatives as special cases. We then systematically study their computational complexity and representational strength, including giving new algorithms and matching complexity-theoretic lower bounds on the time complexity of computing the attention matrix exactly as well as approximately, and tightly determining which polyadic tasks they can each perform. Our results give interesting trade-offs between different desiderata for these mechanisms, including a tight relationship between how expressive a mechanism is, and how large the coefficients in the model may be so that the mechanism can be approximated in almost-linear time.

    Notably, we give a new attention mechanism which can be computed exactly in quadratic time, and which can perform function composition for any fixed number of functions. Prior mechanisms, even for just composing two functions, could only be computed in superquadratic time, and our new lower bounds show that faster algorithms for them are not possible.

  2. $(\alpha\text{-}\beta)$Sayak Chakrabarti, Ashish Dwivedi, Nitin Saxena.
    Solving polynomial systems over non-fields and applications to modular polynomial factoring.
    Journal of Symbolic Computation, 2024.
    [PDF]
    Abstract

    We study the problem of solving a system of n-variate m polynomials over the ring of integers modulo a prime-power pk. The problem over finite fields is well studied in varied parameter settings. For small characteristic p = 2, Lokshtanov et al. (SODA'17) initiated the study, for degree d = 2 systems, to improve the brute force complexity of O(2n) to O(20.8765n); which currently is improved to O(20.6943n) in Dinur (SODA'21). For large p but constant n, Huang and Wong (FOCS'96) gave a randomized poly(d, m, log p) time algorithm. Note that for growing n, system-solving is known to be intractable even with p = 2 and degree d = 2.

    We devise a randomized poly(d, m, log p)-time algorithm to find a root of a given system of m integral polynomials of degrees bounded by d, in n variables, modulo a prime power pk; when n + k is constant. In a way, we extend the efficient algorithm of Huang and Wong (FOCS'96) for system-solving over Galois fields (i.e. characteristic p) to that over Galois rings (i.e. characteristic pk); when k > 1 is constant. The challenge here is to find a lift of singular Fp-roots (exponentially many); as there is no efficient general way known in algebraic-geometry for resolving singularities.

    Our algorithm has application to factoring univariates over Galois rings. Given f ∈ Z[x] and a prime-power pk (k ≥ 2), finding factors of f mod pk has a curious state-of-the-art. It is solved for 'large' k by p-adic factoring algorithms (von zur Gathen, Hartlieb, ISSAC'96); but unsolved for 'small' k. In particular, no nontrivial factoring method is known for k ≥ 5 (Dwivedi, Mittal, Saxena, ISSAC'19). One issue is that degree-δ factors of f (x) mod pk could be exponentially many, as soon as k ≥ 2. We give the first randomized poly( deg (f ), log p)-time algorithm to find a degree-δ factor of f (x) mod pk, when k + δ is constant. Our method has potential application in algebraic coding theory. Algebraic codes, e.g. algebraic-geometric (AG) and Reed-Solomon (RS) codes, are well-studied over Galois rings to achieve better bounds on their parameters.

  3. $(\alpha\text{-}\beta)$Sayak Chakrabarti, Nitin Saxena.
    An effective description of the roots of bivariates mod pk and the related Igusa’s local zeta function.
    International Symposium on Symbolic and Algebraic Computation (ISSAC), 2023.
    [PDF]
    Abstract

    Finding roots of a bivariate polynomial 𝑓 (𝑥1, 𝑥2), over a prime field F𝑝, is a fundamental question with a long history and several practical algorithms are now known. Effective algorithms for describing the roots modulo 𝑝𝑘, 𝑘 ≥ 2, for any general bivariate polynomial, were unknown until the present paper. The main obstruction is lifting the singular F𝑝 roots to Z/𝑝𝑘 Z. Such roots may be numerous and behave unpredictably, i.e., they may or may not lift from Z/𝑝 𝑗 Z to Z/𝑝 𝑗+1Z.

    We give the first algorithm to describe the roots of a bivariate polynomial over Z/𝑝𝑘 Z in a practical way. Notably, when the degree of the polynomial is constant, our algorithm runs in deterministic time which is polynomial in 𝑝 + 𝑘. This is a significant improvement over brute force, which would require exploring 𝑝2𝑘 possible values. Our method also gives the first efficient algorithms for the following problems (which were also open): (1) efficiently representing all the (possibly infinitely-many) 𝑝-adic roots, and (2) computing the underlying Igusa's local zeta function. We also obtain a new, effective method to prove the rationality of this zeta function.

  4. Aditya Gulati, Sayak Chakrabarti, Rajat Mittal.
    On Algorithms to Find p-ordering.
    Conference on Algorithms and Discrete Applied Mathematics (CALDAM), 2021.
    [arXiv]
    Abstract

    The concept of p-ordering for a prime p was introduced by Manjul Bhargava (in his PhD thesis) to develop a generalized factorial function over an arbitrary subset of integers. This notion of p-ordering provides a representation of polynomials modulo prime powers, and has been used to prove properties of roots sets modulo prime powers. We focus on the complexity of finding a p-ordering given a prime p, an exponent k and a subset of integers modulo p^k.

    Our first algorithm gives a p-ordering for set of size n in time O(nk log p), where set is considered modulo p^k. The subsets modulo p^k can be represented succinctly using the notion of representative roots; a natural question would be, can we find a p-ordering more efficiently given this succinct representation. Our second algorithm achieves precisely that, we give a p-ordering in time O(d^2 k log p + n k log p + n d), where d is the size of the succinct representation and n is the required length of the p-ordering. Another contribution that we make is to compute the structure of roots sets for prime powers p^k, when k is small. We explicitly describe all the root sets for p^2, p^3 and p^4.

  5. Soumendu Sundar Mukherjee, Sayak Chakrabarti.
    Graphon estimation from partially observed network data.
    Preprint, 2019.
    [arXiv]
    Abstract

    We consider estimating the edge-probability matrix of a network generated from a graphon model when the full network is not observed—only some overlapping subgraphs are. We extend the neighbourhood smoothing (NBS) algorithm of Zhang et al. (2017) to this missing-data set-up and show experimentally that, for a wide range of graphons, the extended NBS algorithm achieves significantly smaller error rates than standard graphon estimation algorithms such as vanilla neighbourhood smoothing (NBS), universal singular value thresholding (USVT), blockmodel approximation, matrix completion, etc. We also show that the extended NBS algorithm is much more robust to missing data.