Research

Below is a list of my papers. $(\alpha\text{-}\beta)$ denotes alphabetical ordering of co-authors.

Publications & Preprints

6. Every Bit Counts: A Theoretical Study of Precision–Expressivity Tradeoffs in Quantized Transformers.
Sayak Chakrabarti, Toniann Pitassi, Josh Alman.
Submitted, 2026. [arXiv]

Abstract

Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-grained theoretical tradeoff between expressivity and precision: For every p we exhibit a function Γ, inspired by the equality function, and prove that a one-layer softmax Transformer can compute Γ, with p bits of precision, but not with p−1 bits of precision.

This result concretely explains the widely observed phenomenon of empirical loss of expressivity when quantization is used. Practically, it suggests that tasks requiring equality-like comparisons (exact match, membership, etc.) are especially sensitive to quantization. Dropping even one bit can cross a threshold where the model cannot represent the needed comparison reliably. Thus, it paves the way for developing heuristics that will help practitioners choose how much quantization is possible: the precision should be chosen as a function of the length of equality to be checked for the specific task.

Our proofs combine explicit finite-precision Transformer constructions with communication- complexity lower bounds, yielding a tight “one-bit” threshold.

5. Poly-attention: a general scheme for higher-order self-attention.
Sayak Chakrabarti, Toniann Pitassi, Josh Alman.
International Conference on Learning Representations (ICLR), 2026. [arXiv]

Abstract

The self-attention mechanism, at the heart of the Transformer model, is able to effectively model pairwise interactions between tokens. However, numerous recent works have shown that it is unable to perform basic tasks involving detecting triples of correlated tokens, or compositional tasks where multiple input tokens need to be referenced to generate a result. Some higher-dimensional alternatives to self-attention have been proposed to address this, including higher-order attention (Sanford et al., 2023) and Strassen attention (Kozachinskiy et al., 2025), which can perform some of these polyadic tasks in exchange for slower, superquadratic running times.

In this work, we define a vast class of generalizations of self-attention, which we call poly-attention mechanisms. Our mechanisms can incorporate arbitrary higher-order (tensor) computations as well as arbitrary relationship structures between the input tokens, and they include the aforementioned alternatives as special cases. We then systematically study their computational complexity and representational strength, including giving new algorithms and matching complexity-theoretic lower bounds on the time complexity of computing the attention matrix exactly as well as approximately, and tightly determining which polyadic tasks they can each perform. Our results give interesting trade-offs between different desiderata for these mechanisms, including a tight relationship between how expressive a mechanism is, and how large the coefficients in the model may be so that the mechanism can be approximated in almost-linear time.

Notably, we give a new attention mechanism which can be computed exactly in quadratic time, and which can perform function composition for any fixed number of functions. Prior mechanisms, even for just composing two functions, could only be computed in superquadratic time, and our new lower bounds show that faster algorithms for them are not possible.

4. Solving polynomial systems over non-fields and applications to modular polynomial factoring.
$(\alpha\text{-}\beta)$Sayak Chakrabarti, Ashish Dwivedi, Nitin Saxena.
Journal of Symbolic Computation, 2024.
[pdf] [epub]

Abstract

We study the problem of solving a system of n-variate m polynomials over the ring of integers modulo a prime-power pk. The problem over finite fields is well studied in varied parameter settings. For small characteristic p = 2, Lokshtanov et al. (SODA'17) initiated the study, for degree d = 2 systems, to improve the brute force complexity of O(2n) to O(20.8765n); which currently is improved to O(20.6943n) in Dinur (SODA'21). For large p but constant n, Huang and Wong (FOCS'96) gave a randomized poly(d, m, log p) time algorithm. Note that for growing n, system-solving is known to be intractable even with p = 2 and degree d = 2.

We devise a randomized poly(d, m, log p)-time algorithm to find a root of a given system of m integral polynomials of degrees bounded by d, in n variables, modulo a prime power pk; when n + k is constant. In a way, we extend the efficient algorithm of Huang and Wong (FOCS'96) for system-solving over Galois fields (i.e. characteristic p) to that over Galois rings (i.e. characteristic pk); when k > 1 is constant. The challenge here is to find a lift of singular Fp-roots (exponentially many); as there is no efficient general way known in algebraic-geometry for resolving singularities.

Our algorithm has application to factoring univariates over Galois rings. Given f ∈ Z[x] and a prime-power pk (k ≥ 2), finding factors of f mod pk has a curious state-of-the-art. It is solved for 'large' k by p-adic factoring algorithms (von zur Gathen, Hartlieb, ISSAC'96); but unsolved for 'small' k. In particular, no nontrivial factoring method is known for k ≥ 5 (Dwivedi, Mittal, Saxena, ISSAC'19). One issue is that degree-δ factors of f (x) mod pk could be exponentially many, as soon as k ≥ 2. We give the first randomized poly( deg (f ), log p)-time algorithm to find a degree-δ factor of f (x) mod pk, when k + δ is constant. Our method has potential application in algebraic coding theory. Algebraic codes, e.g. algebraic-geometric (AG) and Reed-Solomon (RS) codes, are well-studied over Galois rings to achieve better bounds on their parameters.

3. An effective description of the roots of bivariates mod pk and the related Igusa’s local zeta function.
$(\alpha\text{-}\beta)$Sayak Chakrabarti, Nitin Saxena.
International Symposium on Symbolic and Algebraic Computation (ISSAC), 2023.
[pdf] [epub]

Abstract

Finding roots of a bivariate polynomial 𝑓 (𝑥1, 𝑥2), over a prime field F𝑝, is a fundamental question with a long history and several practical algorithms are now known. Effective algorithms for describing the roots modulo 𝑝𝑘, 𝑘 ≥ 2, for any general bivariate polynomial, were unknown until the present paper. The main obstruction is lifting the singular F𝑝 roots to Z/𝑝𝑘 Z. Such roots may be numerous and behave unpredictably, i.e., they may or may not lift from Z/𝑝 𝑗 Z to Z/𝑝 𝑗+1Z.

We give the first algorithm to describe the roots of a bivariate polynomial over Z/𝑝𝑘 Z in a practical way. Notably, when the degree of the polynomial is constant, our algorithm runs in deterministic time which is polynomial in 𝑝 + 𝑘. This is a significant improvement over brute force, which would require exploring 𝑝2𝑘 possible values. Our method also gives the first efficient algorithms for the following problems (which were also open): (1) efficiently representing all the (possibly infinitely-many) 𝑝-adic roots, and (2) computing the underlying Igusa's local zeta function. We also obtain a new, effective method to prove the rationality of this zeta function.

2. On Algorithms to Find p-ordering.
Aditya Gulati, Sayak Chakrabarti, Rajat Mittal.
Conference on Algorithms and Discrete Applied Mathematics (CALDAM), 2021.
[arXiv]

Abstract

The concept of p-ordering for a prime p was introduced by Manjul Bhargava (in his PhD thesis) to develop a generalized factorial function over an arbitrary subset of integers. This notion of p-ordering provides a representation of polynomials modulo prime powers, and has been used to prove properties of roots sets modulo prime powers. We focus on the complexity of finding a p-ordering given a prime p, an exponent k and a subset of integers modulo p^k.

Our first algorithm gives a p-ordering for set of size n in time O(nk log p), where set is considered modulo p^k. The subsets modulo p^k can be represented succinctly using the notion of representative roots; a natural question would be, can we find a p-ordering more efficiently given this succinct representation. Our second algorithm achieves precisely that, we give a p-ordering in time O(d^2 k log p + n k log p + n d), where d is the size of the succinct representation and n is the required length of the p-ordering. Another contribution that we make is to compute the structure of roots sets for prime powers p^k, when k is small. We explicitly describe all the root sets for p^2, p^3 and p^4.

1. Graphon estimation from partially observed network data.
Soumendu Sundar Mukherjee, Sayak Chakrabarti.
Preprint, 2019.
[arXiv]

Abstract

We consider estimating the edge-probability matrix of a network generated from a graphon model when the full network is not observed—only some overlapping subgraphs are. We extend the neighbourhood smoothing (NBS) algorithm of Zhang et al. (2017) to this missing-data set-up and show experimentally that, for a wide range of graphons, the extended NBS algorithm achieves significantly smaller error rates than standard graphon estimation algorithms such as vanilla neighbourhood smoothing (NBS), universal singular value thresholding (USVT), blockmodel approximation, matrix completion, etc. We also show that the extended NBS algorithm is much more robust to missing data.

Reports

My M.Tech thesis can be found here: Multivariate polynomials modulo prime powers: their roots, zeta-function and applications.

Reports of some old works during my undergrad can be found here:

On factorization and root-finding modulo prime powers. [PDF]
Continuous Skolem problem for higher dimensions. [PDF]
An algorithm to find p-ordering of sets in succinct form. [PDF]
Factorization modulo prime powers. [PDF]
Linear cryptanalysis applied to logic locking. [PDF]