Skip to main content

Yet another circle STARK tutorial

Timofey Yaluhin

Recently, there has been a lot of excitement and interest around circle STARKs. This didn't pass me by either, and my interest was piqued. Not only by the loud "breakthrough" announcements and the prospects of unlocking verifiable computation on Bitcoin, but also by the subtle elegance of applied algebraic geometry to address limitations and simplify previous solutions.

At this point, we have several excellent introductory and explainer articles by zkSecurity, Vitalik, and LambdaClass. I have tried my best not to repeat their words and instead focus on the unexplored aspects and elements that I personally found very interesting. I wrote this as I was learning, so I cannot rule out inaccuracies. If you spot any mistakes, please comment away in this mirror on HackMD.

Introduction

Circle STARKs enable the prover to work over the finite field modulo Mersenne 31 prime (M31), which has very efficient arithmetic. Namely, it is 1.3 times faster than BabyBear, the previous most efficient field used in SP1, Valida, and RISC Zero. For more insight on why working over M31 is so desirable, refer to this article.

However, simply plugging M31 into existing univariate STARKs won't work. The reason is that Mersenne primes are not FFT/FRI friendly. Let's unpack why.

The efficiency of the FFT "divide-and-conquer" algorithm relies on the ability to recursively divide the problem size by powers of 2. When working over a finite field, this requirement necessitates an evaluation domain to have the "structure" of a multiplicative group whose order's factorization contains a large power of two. We refer to this as high 2-adicity. The reason we want this is that at each step of the FFT, we reduce the size of the domain by a factor of two by squaring each number in it.

For example, consider the domain [1,85,148,111,336,252,189,226][1, 85, 148, 111, 336, 252, 189, 226] in the finite field of integers modulo 337337. This domain forms a multiplicative group with 8=238 = 2^3 elements. The elements of this group are powers of ω=85\omega = 85, which is a generator of the group. If you square every number in the domain, you reduce the size of the domain by a factor of two: [1,148,111,336][1, 148, 111, 336]. If you take the same domain but with, say, 6 elements, you don't get this property anymore.

Another relevant way of framing the 2-adicity requirement is to search for a group whose order is a product of small primes. We call such numbers smooth numbers. Since the multiplicative group of a field includes every element except 00, the order of the group is exactly one less than the number of elements in the field. Fields Fp\mathbb{F}_p that satisfy this condition are referred to as p1p-1 smooth.

An exemplary prime is the BabyBear prime p=231227+1p = 2^{31} - 2^{27} + 1. The largest multiplicative group in the BabyBear field has p1=22735p-1 = 2^{27} \cdot 3 \cdot 5 elements. You can clearly see that it is both smooth and highly 2-adic. It's perfect.

What about the Mersenne-31 prime p=2311p = 2^{31} - 1? Unfortunately, 23122^{31} - 2 can be divided by 2 only once. Thus, the conventional Cooley-Tukey FFT algorithm would be very inefficient for this field group. The authors solve this by devising a group from a unit circle.

Circle as a group

A circle group is generated from a point (x,y)(x, y) such that x2+y2=1x^2 + y^2 = 1 (i.e., it lies on a unit circle) by applying a different multiplicative law:

(x0,y0)(x1,y1):=(x0x1y0y1,x0y1+y0x1)(x_0, y_0) \cdot (x_1, y_1) := (x_0 \cdot x_1 - y_0 \cdot y_1, x_0 \cdot y_1 + y_0 \cdot x_1)

Instead of generating subgroup elements as simply powers of a generator gg, we move from a point (xi,yi)(x_i, y_i) on the circle to the point:

(xi+1,yi+1)=(gxxigyyi,gxyi+gyxi)(x_{i+1}, y_{i+1}) = (g_x \cdot x_i - g_y \cdot y_i, g_x \cdot y_i + g_y \cdot x_i)

It turns out that the number of points lying on the circle, defined over the Mersenne prime 23112^{31}-1, is quite large (2312^{31}). One can generate all 2312^{31} group elements by starting with (x0,y0)=(1,0)(x_0, y_0) = (1, 0) and applying the generator (gx,gy)=(2,1268011823)(g_x, g_y) = (2, 1268011823) using the law above.

For the circle FFT/FRI, we need two more group operations: the group squaring map π\pi and the inversion map JJ.

Squaring is the quadratic map defined as

π(x,y):=(x,y)(x,y)=(x2y2,2xy)=(2x21,2xy)\pi(x, y) := (x, y) \cdot (x, y) = (x^2 - y^2, 2 \cdot x \cdot y) = (2 \cdot x^2 - 1, 2 \cdot x \cdot y)

This transformation reduces the set size by half.

Inverse is given by the degree-one map

J(x,y):=(x,y)J(x, y) := (x, -y)

This operation maps each point (x,y)(x, y) to its reflection (x,y)(x, -y).

Map JJ is an involution, i.e., J(J(P))=PJ(J(P)) = P. Maps JJ and π\pi commute, i.e., π(J(P))=J(π(P))\pi(J(P)) = J(\pi(P)) for every PC(Fp)P \in C(\mathbb{F}_p).

FFT over the circle domain

Same as in the classical STARK, the circle FFT is used to evaluate some polynomial on a special domain. In regular FFT, the domain consists of nn-th roots of unity, i.e. {1,ω,ω2,,ωn1}\{1, \omega, \omega^2, \ldots, \omega^{n-1}\}. In circle FFT, the domain is a set of points on a circle curve, generated as follows.

First, we take a circle group generator gg and square it logn1\log n - 1 times to create a subgroup Gn1G_{n-1}. Then, we create two twin cosets, which are formed by taking an element QQ that is not in GnG_n and creating two disjoint sets: QGn1Q \cdot G_{n-1} and Q1Gn1Q^{-1} \cdot G_{n-1}. The union of these sets forms the circle FFT domain containing 2n2^n elements. A self-descriptive implementation can be found here in Plonky3.

An interactive plotter to demonstrate the distribution of domain points, which uses a simple TS implementation of the circle group over Mersenne-17 field. Even though it's modulo prime p, you can still see a regular circular patterns with symmetry about the line p/2. This phenomena is also exists in elliptic curves (genus 1), but is much more apparent on circle curves (genus 0) and complex roots of unity.

The FFT works by recursively splitting the larger problem into two smaller ones. In the context of polynomial evaluation, this means decomposing the polynomial into "even" and "odd" parts. In regular FFT, these sub-polynomials are simply formed from even and odd coefficients of the original polynomial. In circle FFT, this is a bit more involved, as we'll see, but the underlying mechanics is the same — the original polynomial ff is a linear combination f(x)=feven(x)+xfodd(x)f(x) = f_{\operatorname{even}}(x) + x\cdot f_{\operatorname{odd}}(x). At each split, we also reduce the domain by a factor of two.

In the first step, we "halve" the domain by simply taking an xx projection of each point. This is justified by the use of the inverse map when performing a decomposition f(x,y)=f0(x)+yf1(x)f(x, y)=f_0(x)+y \cdot f_1(x) such that:

f0[indexD1(x)]=f[indexD0(x,y)]+f[indexD0(J(x,y))]2f1[indexD1(x)]=f[indexD0(x,y)]f[indexD0(J(x,y))]2y\begin{aligned} & \vec{f_0}[\operatorname{index}_{D_1}(x)]=\frac{\vec{f}[\operatorname{index}_{D_0}(x, y)]+\vec{f}[\operatorname{index}_{D_0}(J(x,y))]}{2} \\ & \vec{f_1}[\operatorname{index}_{D_1}(x)]=\frac{\vec{f}[\operatorname{index}_{D_0}(x, y)]-\vec{f}[\operatorname{index}_{D_0}(J(x,y))]}{2 \cdot y} \end{aligned}

I have modified the notation to demonstrate how in practice we treat polynomials f,f0,f1f, f_0, f_1 as vectors of their coefficients. Compare the original notation here with Vitalik's implementation here.

You can think of the inverse map J(x,y)=(x,y)J(x, y) = (x, -y) as a way to identify vertically mirrored pairs of points so that they can be treated as a single entity. This allows us to proceed with only the xx coordinate.

In later steps, we continue to halve the domain using the univariate squaring map π(x)=2x21\pi(x)=2 \cdot x^2-1. This is analogous to squaring the kk-th root of unity such that ω2k=ωk\omega^{2k} = \omega_k. The even-odd decomposition (in the original notation) fj(x)=fj+1(π(x))+xfj+1(π(x))f_j(x) = f_{j+1}(\pi(x))+x \cdot f_{j+1}(\pi(x)) — now looks like this:

f0(π(x))=f(x)+f(x)2f1(π(x))=f(x)f(x)2x\begin{aligned} & f_0(\pi(x))=\frac{f(x)+f(-x)}{2} \\ & f_1(\pi(x))=\frac{f(x)-f(-x)}{2 \cdot x} \end{aligned}

Although speeding up the FFT is not the main point of the paper, as author points out here, it serves as an effective demonstration of the core principles of working over the circle group. When studying the paper, I made the mistake of skipping it and rushing to circle FRI only to hit a wall of confusion. So, I encourage you to take some time to appreciate this mechanic. If you want to play around with it, I made this Python notebook (trigger warning: shitty code).

Regular FRI

Structure of the FRI algorithm is a lot like FFT. But instead of recursively dividing the polynomial into many smaller ones, in FRI, the prover iteratively reduces the degree of a polynomial until it gets to some small constant-size one. It does so via random linear combination, i.e., by combining "even" and "odd" sub-polynomials against a random weight.

In STARK, we use FRI for something called "low-degree testing"—by knowing the final degree and the number of reduction steps, the verifier can work backwards and check that the degree bound of the original polynomial is as claimed. More formally, FRI enables an untrusted prover to convince a verifier that a committed vector is close to a Reed-Solomon codeword of degree dd over the domain DD.

Here, being "close" is defined by the relative Hamming distance δ\delta such that the number of points where the committed vector and the codeword disagree is at most δD\delta \cdot |D|. The distance δ\delta is such that δ(0,1ρ)\delta \in (0,1-\sqrt{\rho}) where 0<ρ<10 < \rho < 1 is the rate of the Reed-Solomon code. In turn, the rate ρ\rho is defined by the blow-up factor 2B,B12^B, B \geq 1 so that ρ=2B\rho=2^{-B}. Finally, the domain size is D=2n+B|D| = 2^{n+B} where 2n2^n is the size of the original vector.

To make this definition more senseful, let's assign standard values: A commonly used blow-up factor is 2B=42^B=4, so the rate is ρ=0.25\rho = 0.25. The worst possible distance is δ=0.5\delta=0.5, so for a vector with 1024 elements, the codeword over a 2B+n=40962^{B+n}=4096 sized domain can disagree on at most half of the points. In practice, δ\delta is much smaller to give better soundness guarantees.

Another interesting property is the decoding regime. In the unique decoding regime, the goal is to identify a single low-degree polynomial that is close to the committed vector. The unique decoding radius is typically defined as: θ[0,1ρ2]\theta \in [0, \frac{1-\rho}{2}].

STARKs are shown to be sound outside this regime as the goal is to demonstrate the existence of such polynomials, even if multiple polynomials fit the given points. We refer to this simplified requirement as the list decoding regime. The list decoding radius is typically defined as: θ[1ρ2,1ρ]\theta \in [\frac{1-\rho}{2}, 1-\sqrt{\rho}]. HAB23

Circle FRI

The general mechanics behind low-degree testing over the circle domain remain unchanged from the regular FRI. As a reminder, see this blog post for an in-depth theoretical discussion or this one if you only care about the implementation.

Circle FRI operates on codewords in the circle code C\mathcal{C}, which is a generalization of the Reed-Solomon code defined over elements of the circle group and special polynomials in Riemann-Roch space. Oversimplified, they are bivariate polynomials modulo the unit circle equation (x2+y2=1)(x^2 + y^2 = 1), so whenever a polynomial has y2y^2, it's replaced by 1x21 - x^2.

For a given proximity parameter θ(0,1ρ)\theta \in (0,1-\sqrt{\rho}), the interactive oracle proof of a function f:XFDf: X \rightarrow \mathbb{F}^D (mapping the committed vector) being θ\theta-close to the circle codeword C\mathcal{C} consists of rr rounds of a commit phase and a subsequent query phase, which are as follows.

Commit phase

  1. P\mathbf{P} decomposes ff into f=g+λvnf=g+\lambda \cdot v_n and sends λ\lambda to V\mathbf{V}
  2. V\mathbf{V} picks random weight λj\lambda_j for layer jj
  3. For each j=1,,rj=1, \ldots, r, P\mathbf{P} decomposes gj1g_{j-1} into "even" and "odd" parts; sends a commitment [gj][g_j] to V\mathbf{V}
  4. In the last round, P\mathbf{P} sends gr+1g_{r+1} in plain.

1. P\mathbf{P} decomposes ff into f=g+λvnf=g+\lambda \cdot v_n and sends λ\lambda to V\mathbf{V}

We start with the extended domain coset Dn+BD_{n+B}. First, we want to find the component of ff that is aligned with vnv_n. This can be done using vector projection: given two functions (or vectors) ff and vnv_n, the projection of ff onto vnv_n is given by:

projvn(f)=f,vnDvn,vnDvn\operatorname{proj}_{v_n}(f)=\frac{\langle f, v_n\rangle_D}{\langle v_n, v_n\rangle_ D} \cdot v_n

Note: angle brackets denote the inner product vn,fD=xDvn(x)f(x)\langle v_n, f\rangle_D=\sum_{x \in D} v_n(x) \cdot f(x).

Taking λ\lambda as the magnitude of this projection will ensure that λvn\lambda \cdot v_n is the part of ff that lies along vnv_n

λ=f,vnDvn,vnD\lambda=\frac{\langle f, v_n\rangle_D}{\langle v_n, v_n\rangle_D}

The vanishing polynomial vnv_n has an alternating behavior over the domain DD, e.g., if DD has size N=2nN=2^n, then vnv_n alternates as (1,1,1,1,)(1,-1,1,-1, \ldots). This significantly simplifies the inner product calculations as vn(x){1,1}v_n(x) \in \{1,-1\}, each term vn(x)2=1v_n(x)^2=1 so

vn,vnD=xD1=D=N\langle v_n, v_n\rangle_D=\sum_{x \in D} 1=|D|=N

Knowing λ\lambda, we can now find gg, which represents the component of ff that is orthogonal to vnv_n

g=fλvng=f-\lambda \cdot v_n

This ensures that gg lies entirely in the FFT space LN(F)\mathcal{L}_N^{\prime}(F), orthogonal to vnv_n. This is because the inner product g,vnD=0\langle g, v_n\rangle_D=0, making gg and vnv_n orthogonal by construction.

2. V\mathbf{V} picks random weight λj\lambda_j for layer jj

In practice, though, P\mathbf{P} can compute λj\lambda_j as a random linear accumulator starting with λ\lambda and using a single random weight αE\alpha \in \mathbb{E} picked by V\mathbf{V}

λj=λj1α+λ\lambda_j = \lambda_{j-1} \cdot \alpha + \lambda

See this done in Stwo.

3. For each j=1,,rj=1, \ldots, r, P\mathbf{P} decomposes gj1g_{j-1} into "even" and "odd" parts

The "even-odd" decomposition follows the same progression as in FFT. In the first round, we work with 2D points (x,y)(x, y) and use the full squaring π(x,y)\pi(x, y) and inverse J(x,y)J(x, y) maps. The inverse map transformation allows us to identify points (x,y)(x, y) and their reflection (x,y)(x, -y) so we can treat them as a single point xx on the xx-axis in subsequent rounds. The squaring map π\pi transforms the domain Dj1D_{j-1} into DjD_j by effectively halving the space of points:

gj1,0(πj(x))=gj1+gj1(J(x))2,gj1,1(πj(x))=gj1gj1(J(x))2tj,\begin{aligned} g_{j-1,0}(\pi_j(x)) & =\frac{g_{j-1}+g_{j-1}(J(x))}{2}, \\ g_{j-1,1}(\pi_j(x)) & =\frac{g_{j-1}-g_{j-1}(J(x))}{2 \cdot t_j}, \end{aligned}

where π(x)=2x21\pi(x)=2 \cdot x^2-1 and tjt_j is a twiddle factor. Then, fold into a random linear combination against λj\lambda_j:

gj=gj1,0+λjgj1,1g_j=g_{j-1,0}+\lambda_j \cdot g_{j-1,1}

Commitment [gj][g_j] is a simple Merkle root. Under the Fiat-Shamir transformation, P\mathbf{P} also sends openings, i.e., Merkle branches for FRI queries.

Query phase

1. V\mathbf{V} samples s1s \geq 1 queries uniformly from the domain DD

For each query QQ, V\mathbf{V} considers its "trace" as the sequence of points obtained by repeatedly applying the squaring map and the transformations defined by the protocol. The initial query point Q0=QQ_0=Q is transformed through several rounds, resulting in a series of points QjQ_j in different domains DjD_j:

Qj=πj(Qj1)Q_j=\pi_j(Q_{j-1})

2. For each j=1,,rj=1, \ldots, r, V\mathbf{V} asks for the values of the function gjg_j at a query point QjQ_j and its reflection Tj(Qj)T_j(Q_j)

Given the commitment [gj][g_j], as in oracle access, V\mathbf{V} can ask the oracle for the values (openings) at query points. In other words, verify Merkle proofs at the points:

gj(Qj) and gj(Tj(Qj))g_j(Q_j) \text{ and } g_j(T_j(Q_j))

where at j=1j=1, Tj=JT_j=J and for j>1j>1, Tj(x)=xT_j(x)=-x.

3. V\mathbf{V} checks that the returned values match the expected values according to the folding rules

This involves checking that the even and odd decompositions are correct and that the random linear combinations used to form gjg_j from gj1g_{j-1} are consistent.

DEEP quotients

Quotienting is the fundamental polynomial identity check used in STARK and PLONKish systems. Leveraging the polynomial remainder theorem, it allows one to prove the value of a polynomial at a given point. Vitalik's overview of quotienting is well on point, so I will focus on the DEEP-FRI over the circle group.

DEEP (Domain Extension for Eliminating Pretenders) is a method for checking consistency between two polynomials by sampling a random point from a large domain. In layman's terms, it lets FRI be secure with fewer Merkle branches.

In STARK, we use the DEEP method on a relation between the constraint composition polynomial (one that combines all the constraints) and the trace column polynomials, all evaluated on a random point zz. For more context, see this post and the ethSTARK paper.

The first part is DEEP algebraic linking: It allows us to reduce the STARK circuit satisfiability checks (for many columns and many constraints) to a low-degree test (FRI) on single-point quotients. By itself, DEEP-ALI is insecure because the prover can cheat and send inconsistent evaluations on zz. We fix this with another instance of quotienting—DEEP quotienting.

We construct DEEP quotients with a single-point vanishing function vzv_z in the denominator to show that a certain polynomial, say a trace column tt, evaluates to the claim yjy_j at zz, i.e., tt(z)vz\frac{t - t(z)}{v_z}.

In classical STARKs, vzv_z is simply a line function XzX - z. In circle STARKs, in addition to not having line functions, we also run into the problem that single-point (or any odd degree) vanishing polynomials don't exist. To get around this, we move to the complex extension, i.e., decomposing into real and imaginary parts

tt(z)vz=Re(tt(z)vz)+iIm(tt(z)vz)\frac{t-t(z)}{v_z}=\operatorname{Re}\left(\dfrac{t-t(z)}{v_z}\right)+i \cdot \operatorname{Im}\left(\dfrac{t-t(z)}{v_z}\right)

Using with this quirk, we follow the standard procedure constructing a composite DEEP quotient polynomial as a random linear combination. We use different random weights for the real and imaginary parts

Q=i=0L1γiRez(ti)+i=0L1γL+iImz(ti)=Rez(i=0L1γiti)+γLImz(i=0L1γiti)Q=\sum_{i=0}^{L-1} \gamma^i \cdot \operatorname{Re}_z(t_i)+\sum_{i=0}^{L-1} \gamma^{L+i} \cdot \operatorname{Im}_z(t_i)=\operatorname{Re}_z\left(\sum_{i=0}^{L-1} \gamma^i \cdot t_i\right)+\gamma^L \cdot \operatorname{Im}_z\left(\sum_{i=0}^{L-1} \gamma^i \cdot t_i\right)
FYI

In the ethSTARK paper, the prover does batching using LL random weights γ1,,γL\gamma_1, \ldots, \gamma_L provided by the verifier (affine batching). Here, the prover uses powers of a single random weight γ1,,γL\gamma^1, \ldots, \gamma^L (parametric batching).

Computing QQ naïvely will suffer overhead due to the complex extension, essentially doubling the work due to real and imaginary decompositions. The authors solve this by exploiting the linearity in the above equation. The prover now computes

Q=(Re(1vz)+γLIm(1vz))(gˉvˉz)=Re(vz)zLIm(vz)Re(vz)2+Im(vz)2(gˉvˉz)Q=\left(\operatorname{Re}\left(\frac{1}{v_z}\right)+\gamma^L \cdot \operatorname{Im}\left(\frac{1}{v_z}\right)\right) \cdot(\bar{g}-\bar{v}_z)=\frac{\operatorname{Re}(v_z)-z^L \cdot \operatorname{Im}(v_z)}{\operatorname{Re}(v_z)^2+\operatorname{Im}(v_z)^2} \cdot(\bar{g}-\bar{v}_z)

where gˉ=k=0L1γkgk\bar{g}=\sum_{k=0}^{L-1} \gamma^k \cdot g_k and vˉz=k=0L1γkgk(z)\bar{v}_z=\sum_{k=0}^{L-1} \gamma^k \cdot g_k(z).

Interestingly, Stwo and Plonky3 actually use different quotienting approaches: Plonky3 implements the method described in the paper and here, but Stwo chooses instead to use a 2-point vanishing polynomial, as described in Vitalik's note.

Field work

In circle STARK, work is done over the finite field Fp\mathbb{F}_p and its extension E=Fpk\mathbb{E} = \mathbb{F}_{p^k} where kk is the extension degree. For Fp\mathbb{F}_p, we choose M31, but any other p+1p+1 smooth prime will suffice. The extension field E\mathbb{E} used in Stwo and Plonky3 is QM31, a degree-4 extension of M31, aka. the secure field.

In rare cases, it will be useful to work with the complex extension F(i)\mathbb{F}(i), the field that results from F\mathbb{F} by extending it with the imaginary unit ii. Note that ii may trivially be contained in some fields, e.g., if 14mod5-1 \equiv 4 \mod 5 then ii in F5\mathbb{F}_5 is both 4=2\sqrt{4} = 2 and 33 (since 23mod542^3 \mod 5 \equiv 4). For those fields where this isn't the case, such as F3\mathbb{F}_3, we can devise a quadratic extension, F9={a+bia,bF3}\mathbb{F}_9 = \{ a + bi \mid a, b \in \mathbb{F}_3 \}, which extends the original field in the same way complex numbers extend rationals.

FYI

A more common notation is F[X]/(X2+1)\mathbb{F}[X] / \left(X^2+1\right), which means forming a field extension where X2+1=0X^2+1=0. This results in a new field with elements of the form a+bia+b i where a,bFa, b \in \mathbb{F} and ii is a root of X2+1X^2+1, such that i2=1i^2=-1.

  • Trace values are over the base field and trace domain are points over the base field C(F)\mathbb{C}(\mathbb{F}).
  • The evaluation domain consists of points over the base field C(F)\mathbb{C}(\mathbb{F}).
  • All random challenges and weights for linear combinations are drawn from the secure field.
  • When computing DEEP quotients, we move from the base field to secure fields, while briefly using complex extensions for vanishing denominators.
  • Circle FRI works mainly on the secure field. However, query values are base field elements since we Merkle commit to values in the base field.

The results

That's cool and all, but let's see how all this translates into actual performance differences. Below I have profiling charts comparing Plonky3 with regular STARK over BabyBear and circle STARK over M31.

Image 0
Image 1

The first thing to note is that the difference in prover time is indeed about 1.3x. You can see that circle STARK spends less time committing to the trace, as it does so over a more efficient base field. Consequently, since computing quotients and FRI involves extension field work, these are now account for a larger percentage of the total work.

Thanks to Shahar Papini for feedback and discussion.

References

  • [HLP24] Haböck et al. (2024) "Circle STARKs"
  • [HAB23] Ulrich Haböck (2023) "A summary on the FRI low degree test"
  • [ethSTARK] StarkWare Team (2023) "ethSTARK Documentation — Version 1.2"