The Longfellow Zero-knowledge Scheme

The Longfellow Zero-knowledge Scheme Google

matteof@google.com

Google

shelat@google.com

Internet Network Working Group This document defines an algorithm for generating and verifying a succinct non-interactive zero-knowledge argument that for a given input x and a circuit C, there exists a witness w, such that C(x,w) evaluates to 0. The technique here combines the MPC-in-the-head approach for constructing ZK arguments described in Ligero with a verifiable computation protocol based on sumcheck for proving that C(x,w)=0.

Introduction A zero-knowledge (ZK) scheme allows a Prover who holds an arithmetic circuit C defined over a finite field F and two inputs (x,w) to convince a Verifier who holds only (C,x) that the Prover knows w such that C(x,w) = 0 without revealing any extra information to the Verifier. The concept of a zero-knowledge scheme was introduced by Goldwasser, Micali, and Rackoff , and has since been rigourously explored and optimized in the academic literature. There are several models and efficiency goals that different ZK schemes aim to achieve, such as reducing prover time, reducing verifier time, or reducing proof size. Some ZK schemes also impose other requirements to achieve their efficienc goals. This document considers the scenario in which there are no common reference strings, or trusted parameter setups that are available to the parties. This immediately rules out several succinct ZK scheme from the literature. In addition, this document also focuses on schemes that can be instantiated from a collision-resistant hash function and require no other complexity theoretic assumption. Again, this rules out several schemes in the literature. All of the ZK schemes from the literature that remain can be defined in the Interactive Oracle Proof (IOP) model, and this document specifies a family of them that enjoys both efficiency and simplicity.

The Longfellow system This document specifies the Longfellow ZK scheme described in the paper . The scheme is constructed from two components: the first is the Ligero scheme, which provides a cryptographic commitment scheme that supports an efficient ZK argument system that enables proving linear and quadratic constraints on the committed witness, and the second is a public-coin interactive protocol (IP) for producing an argument that C(x,w)=0 where C is such a circuit, x is a public input, and w is a private witness. The overall scheme works by having the Prover commit to the witness w as well as a pad used to commit the transcript of the IP, then to run the IP with the verifier in a way that produces a commitment to the transcript of the IP, and finally, by running the Ligero proof system to prove that the transcript in the commitment induces the IP verifier to accept.

Basic Operations and Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 . Additionally, the key words "MIGHT", "COULD", "MAY WISH TO", "WOULD PROBABLY", "SHOULD CONSIDER", and "MUST (BUT WE KNOW YOU WON'T)" in this document are to interpreted as described in RFC 6919 . Except if said otherwise, random choices in this specification refer to drawing with uniform distribution from a given set (i.e., "random" is short for "uniformly random"). Random choices can be replaced with fresh outputs from a cryptographically strong pseudorandom generator, according to the requirements in , or pseudorandom function.

Array primitives The notation A[0..N] refers to the array of size N that contains A[0],A[1],...,A[N-1], i.e., the right-boundary in the notation X..Y is an exclusive index bound. The following functions are used throughout the document:

copy(n, Dst, Src): copies n elements from Src to Dst with different strides
axpy(n, Y, A, X): sets Y[i] += A*X[i] for 0 <= i < n.
sum(n, A): computes the sum of the first n elements in array A
dot(n, A, Y): computes the dot product of length n between arrays A and Y.
add(n, A, Y): returns the array [A[0]+Y[0], A[1]+Y[1], ..., A[n-1]+Y[n-1]].
prod(n, A, Y): returns the array [A[0]*Y[0], A[1]*Y[1], ..., A[n-1]*Y[n-1]].
equal(n, A, Y): true if A[i]==Y[i] for 0 <= i < n and false otherwise.
gather(n, A, I): returns the array [A[I[0]], A[I[1]], ..., A[I[n-1]].
A[n][m] = [0]: initializes the 2-dimensional n x m array A to all zeroes.
A[0..NREQ] = X : array assignment, this operation copies the first NREQ elements of X into the corresponding indicies of the A array.

Polynomial operations This section describes operations on and associated with polynomials that are used in the main protocol.

Extend method in Field F_p The extend(f, n, m) method interprets the array f[0..n] as the evaluations of a polynomial P of degree less than n at the points 0,...,n-1, and returns the evaluations of the same P at the points 0,...,m-1. For sufficiently large fields |F_p| = p >= m, polynomial P is uniquely determined by the input, and thus extend is well defined. As there are several algorithms for efficiently performing the extend operation, the implementor can choose a suitable one. In some cases, the brute force method of using Lagrange interpolation formulas to compute each output point independently may suffice. One can employ a convolution to implement the extend operation, and in some cases, either the Number Theoretic Transform or Nussbaumer's algorithm can be used to efficiently compute a convolution.

Extend method in Field GF 2^k The previous section described an extend method that applies to odd prime-order finite fields which contain the elements 0,1,2...,m. In the special case of GF(2^k), the extend operator is defined in an opinionated way inspired by the Additive FFT algorithm by Lin et al . Lin et al. define a novel polynomial basis for polynomials as an alternative to the usual monomial basis xⁱ, and give an algorithm for evaluating a degree-(d-1) polynomial at all d points in a subspace, for d=2^ell, and for polynomials expressed in the novel basis. Specifically, this document implements GF(2¹²⁸) as GF{2}[x] / (Q(x)) where With this choice of Q(x), x is a generator of the multiplicative group of the field. Next, choose GF(2¹⁶) as the subfield of GF(2¹²⁸) with g=x^{(2^{128}-1) / (2^{16}-1)} as its generator, and beta_i=g^i^ for 0 <= i < 16 as the basis of the subfield. For relevant problem sizes, this allows encoding elements in a commitment scheme with 16-bits instead of 128. Writing j_i for the i-th bit of the binary representation of j, that is, inject integer j into a field element inj(j) by interpreting the bits of j as coordinates in terms of the basis: In this setting, define the extend operator to interpret the array f[0..n] to consist of the evaluations of a polynomial p(x) of degree at most n-1 at the n points x \in { inj(i) : 0 <= i < n } and to return the set { p(inj(i)) : 0 <= i < m} which consist of the evaluations of the same polynomial p(x) at the injected points 0,...,m-1. This convention allows this operation to be completed efficiently using various forms of the additive FFT as described in .

Fiat-Shamir primitives A ZK protocol must in general be interactive whereby the Prover and Verifier engage in multiple rounds of communication. However, in practice, it is often more convenient to deploy so-called ``non-interactive" protocols that only require a single message from Prover to Verifier. It is possible to apply the Fiat-Shamir heuristic to transform a special class of interactive protocols into single-message protocols from Prover to Verifier. The Fiat-Shamir transform is a method for generating a verifier's public coin challenges by processing the concatenation of all of the Prover's messages. The transform can be proven to be sound when applied to an interactive protocol that is round-by-round sound and when the oracle is implemented with a hash function that satisfies a correlation-intractability property with respect to the state function implied by the round-by-round soundness. See Theorem 5.8 of for details. In practice, whether an implementation of the random oracle satisfies this correlation-intractability property becomes an implicit assumption. Towards that, this document adapts best practices in selecting the oracle implementation. First, the random oracle should have higher circuit depth and require more gates to compute than the circuit C that the protocol is applied to. Furthermore, the size of the messages which are used as input to the oracle to generate the Verifier's challenges should be larger than C. These choices are easy to implement and add very little processing time to the protocol. On the other hand, they seemingly avoid attacks against correlation-intractability in which the random oracle is computed within the ZK protocol thereby allowing the output of the circuit to be related to the verifier's challenge. As an additional property, each query to the random oracle should be able to be uniquely mapped into a protocol transcript. To facilitate this property, the type and length of each message is incorporated into the query string.

Implementation Let H be a collision-resistant hash function. A protocol consists of multiple rounds in which a Prover sends a message, and a verifier responds with a public-coin or random challenge. The Fiat-Shamir transform for such a protocol is implemented by maintaining a transcript object.

Initialization At the beginning of the protocol, the transcript object must be initialized.

transcript.init(session_id): The initialization begins by selecting an oracle, which concretely consists of selecting a fresh session identifier. This process is handled by the encapsulating protocol---for example, the transcript that is used for key exchange for a session can be used as the session identifier as it is guaranteed to be unique.

Writing to the transcript The transcript object supports a write method that is used to record the Prover's messages. To produce the verifier's challenge message, the transcript object internally maintains a Fiat-Shamir Pseudo-random Function (FSPRF) object that generates a stream of pseudo-random bytes. Each invocation of write creates a new FSPRF object, which we denote by fs.

transcript.write(msg): appends the Prover's next message to the transcript.

There are three types of messages that can be appended to the transcript: a field element, an array of bytes, or an array of field elements.

To append a field element, first the byte designator 0x1 is appended, and then the canonical byte serialization of the field element is appended.
To append an array of bytes, first the byte designator 0x2 is appended, an 8-byte little-endian encoding of the number of bytes in the array is appended, and then the bytes of the array are appended.
To append an array of field elements, the byte designator 0x3 is added, an 8-byte little-endian encoding of the number of field elements is appended, and finally, all of the field elements in array order are serialized and appended.

Special rules for the first message The write method for the first prover message incorporates additional steps that enhance the correlation-intractability property of the oracle. To process the Prover's first message (which is usually a commitment):

The Prover message is appended to the transcript. Specifically, the length of the message, as per the above convention, is appended, and then the bytes of the message are appended.
Next, an encoding of the statement to be proven, which consists of the circuit identifier, and a serialization of the input and output of the statement is appended. Each of these three message are added as byte sequences, with their length appended as per convention.
Finally, the transcript is augmented by the byte-array 0^|C|, which consists of |C| bytes of zeroes.

One might at first think of performing steps 2 and 3 first so as to simplify the description of the protocol, and moreover step 3 may appear to be unnecessary. Performing the steps in the indicated order protects against the attack described in , under the assumption that it is infeasible for a circuit C that contains |C| arithmetic gates to compute the hash of a string of length |C|. Subsequent calls to the write method are used to record the Prover's response messages msg. In this case, the message is appended following the conventions described above.

The FSPRF object Each write internally creates an FSPRF object fs that is seeded with the hash digest of the transcript at the end of the write operation. The FSPRF object is defined to produce an infinite stream of bytes that can be used to sample all of the verifier's challenges in this round. The stream is organized in blocks of 16 bytes each, numbered consecutively starting at 0. Block i contains where KEY is the seed of the FSPRF object, and ID(i) is the 16-byte little-endian representation of integer i. The FSPRF object supports a bytes method:

b = fs.bytes(n) returns the next n bytes in the stream.

Thus, fs implicitly maintains an index into the next position in the stream. Calls to bytes without an intervening write read pseudo-random bytes from the same stream.

Generating challenges When the prover has finished sending messages for a round in the interactive protocol, it can make a sequence of calls to transcript.generate_{nat,field_element,challenge} to obtain the Verifier's random challenges. The bytes method of the FSPRF is used by the transcript object to sample pseudo-random field elements and pseudo-random integers via rejection sampling as follows:

transcript.generate_nat(m) generates a random natural between 0 and m-1 inclusive, as follows.

Let l be minimal such that 2^l >= m. Let nbytes = ceil(l / 8). Let b = fs.bytes(nbytes). Interpret bytes b as a little-endian integer k. Let r = k mod 2^l, i.e., mask off the high 8 * nbytes - l bits of k. If r < m return r, otherwise start over.

transcript.generate_field_element(F) generates a field element.

If the field F is Z / (p), return generate_nat(fs, p) interpreted as a field element. If the field is GF(2)[X] / (X^128 + X^7 + X^2 + X + 1) obtain b = fs.bytes(16) and interpret the 128 bits of b as a little-endian polynomial. This document does not specify the generation of a field element for other binary fields, but extensions SHOULD follow a similar pattern.

a = transcript.generate_challenge(F, n) generates an array of n field elements in the straightforward way: for 0 <= i < n in ascending order, set a[i] = transcript.generate_field_element(F).

Ligero ZK Proof This section specifies the construction and verification method for a Ligero commitment and zero-knowledge argument. The Ligero system as described by Ames, Hazay, Ishai, and Venkitasubramaniam , consists of a commitment scheme, and a method for proving linear and quadratic constraints on the committed values in zero-knowledge. The later interface is sufficient to prove arbitrary circuits, but in the Longfellow scheme, it suffices to describe how to use such constraints to directly verify an IP transcript.

Merkle trees This section describes how to construct a Merkle tree from a sequence of n strings, and how to verify that a given string x was placed at leaf i in a Merkle tree. These methods do not assume that n is a power of two. This construction is parameterized by a cryptographic hash function such as SHA-256 . In this application, a leaf in a tree is a message digest instead of an arbitrary string; for example, if the hash function is SHA-256, then the leaf is a 32-byte string. A tree that contains n leaves is represented by an array of 2 * n message digests in which the input digests are written at indicies n..(2*n - 1). The tree is constructed by iteratively hashing the concatenation of the values at indicies 2*j and 2*j+1, starting at j=n-1, and continuing until j=1. The root is at index 1. In this specification, the prover and verifier will already know the value of n when they produce or verify a Merkle tree.

Constructing a Merkle tree from n digests

Constructing a proof of inclusion This section describes how to construct a Merkle proof that k input digests at indicies i[0],...,i[k-1] belong to the tree. The simplest way to generate such a proof is to produce independent proofs for each of the k leaves. However, this turns out to be wasteful in that internal nodes may be included multiple times along different paths, and some nodes may not need to be included at all because they are implied by nodes that have already been included. To address these inefficiencies, this section explains how to produce a batch proof of inclusion for k leaves. The main idea is to start from the requested set of leaves and build all of the implied internal nodes given the leaves. For example, if sibling leaves are included, then their parent is implied, and the parent need not be included in the compressed proof. Then it suffices to revisit the same tree and include the necessary siblings along all of the Merkle paths. It is assumed that the verifier already has the leaf digests that are at the indicies, and thus the proof only contains the necessary internal nodes of the Merkle tree that are used to verify the claim. It is important in this formulation to treat the input digests as a sequence, i.e. with a given order. Both the prover and verifier of this batch proof must use the same order of the requested_leaves array.

Verifying a proof of inclusion This section describes how to verify a compressed Merkle proof. The claim to verify is that "the commitment root defines an n-leaf Merkle tree that contains k digests s[0],..s[k-1] at corresponding indicies i[0],...i[k-1]." The strategy of this verification procedure is to deduce which nodes are needed along the k verification paths from index to root, then read these values from the purported proof, and then recompute the Merkle tree and the consistency of the root digest. As an optimization, the defined[] array avoids recomputing internal portions of the Merkle tree that are not relevant to the verification. By convention, a proof for the degenerate case of k=0 digests is defined to fail. It is assumed that the indicies[] array does not contain duplicates. |proof| { return false } tmp[child] = proof[proof_index++] defined[child] = true } } FOR 0 <= i < k DO tmp[indicies[i] + n] = s[i] defined[indicies[i] + n] = true FOR n < j <= 1 DO if defined[2 * i] && defined[2 * i + 1] { tmp[i] = hash(tmp[2 * i] || tmp[2 * i + 1]) defined[i] = true } return defined[1] && tmp[1] = root } ]]>

Common parameters The Prover and Verifier in Ligero must agree on the following parameters. These parameters can be agreed upon out of band.

F: The finite field over which the commit is produced.
NREQ: The number of columns of the commitment matrix that the Verifier requests to be revealed by the Prover.
rate: The inverse rate of the error correcting code. This parameter, along with NREQ and Field size, determines the soundness of the scheme.
BLOCK: the size of each row, in terms of number of field elements
DBLOCK: 2 * BLOCK - 1
WR: the number of witness values included in each row.
QR: the number of quadratic constraints written in each row
IW: Row index at which the witness values start, usually IW = 2.
IQ: Row index at which the quadratic constraints begin, it is the first row after all of the witnesses have been encoded.
NL: Number of linear constraints.
NQ: Number of quadratic constraints.
NWROW: Number of rows used to encode witnesses.
NQT: Number of row triples needed to encode the quadratic constraints.
NQW: NWROW + NQT, rows needed to encode witnesses and quadratic constraints.
NROW: Total number of rows in the witness matrix, NQW + 2
NCOL: Total number of columns in the tableau matrix.

Constraints on parameters

BLOCK < |F| The block size must be smaller than the field size.
BLOCK > NREQ The block size must be larger than the number of columns requested.
BLOCK = NREQ + WR
BLOCK >= 2 * (NREQ + QR) + (NREQ + WR) - 2
WR >= QR.
BLOCK >= 2 * (NREQ + WR) - 1.
QR >= NREQ (and thus WR >= NREQ) to avoid wasting too much space.

Ligero commitment The first step of the proof procedure requires the Prover to commit to a witness vector W. The witness vector is assumed to be padded with zeros at the end so that its length is an even multiple of WR. The commitment is the root of a Merkle tree. The leaves of the Merkle tree are a sequence of columns of the tableau matrix T[][]. This tableau matrix is constructed row-by-row by applying the extend procedure to arrays that are formed from random field elements and elements copied from the witness vector. Matrix T[][] has size NROW x NCOL and has the following structure:

The first ILDT row is defined as by selecting BLOCK random field elements and applying extend.
The second IDOT row is defined as by first selecting DBLOCK random field elements such that the subarray from index NREQ to NREQ + WR sums to 0 and then applying extend. The first step can be performed by selecting DBLOCK-1 random field elements, and then setting element of the specified range to be the additive inverse of the sum of elements from NREQ...NREQ + WR - 1.
The third IQD row is defined as ZQ = RANDOM[DBLOCK] ZQ[NREQ ... NREQ + WR - 1] = 0 extend(ZQ, DBLOCK, NCOL) by first selecting DBLOCK random field elements, and then setting the portion coresponding to the witness values to 0 and then applying extend.
The next rows from IW=3,...,IQ are padded witness rows that contain random elements and portions of the witness vector. Specifically, row i is formed by applying extend to an array that consists of NREQ random elements and then WR elements from the vector W: When the finite field contains a subfield, and if all of the witness elements in a given row are elements from this subfield, then the randomness for that row can also be chosen from the subfield. Consequently, the extend method for that row produces polynomial evaluations that are elements of the subfield. When these elements are serialized, they will require less space. The simplest way to apply this optimization is for the commiting process to maintain an index SF such that witnesses at indices 0..SF belong to the subfield, and the rest do not. This value SF can be conveyed to the verifier as part of the proof, or part of the circuit.
The final portion of the witness matrix consists of padded quadratic rows that consists of NREQ random elements and WR quadratic constraint elements: The specific elements in the QX, QY, QZ array are determined by the quadratic constraints on the witness values that are verified by the proof.

The second step of the procedure is to compute a Merkle tree on columns of the tableau matrix. Specifically, the i-th leaf of the tree is defined to be columns DBLOCK...NCOL of the i-th row of the tableau T. Input:

The witness vector W.
Array of quadratic constraints lqc[], which consists of triples (x,y,z) that represent the constraint that W[x] * W[y] = W[z].

Output:

A digest; root of a Merkle tree formed from columns of the tableau.

Ligero Prove This section specifies how a Ligero proof for a given sequence of linear constraints and quadratic constraints on the committed witness vector W is constructed. The proof consists of a low-degree test on the tableau, a linearity test, and a quadratic constraint test.

Low-degree test In the low-degree test, the verifier sends a challenge vector consisting of NROW field elements, u[0..NROW]. This challenge is generated via the Fiat-Shamir transform. The prover computes the sum of u[i]*T[i] where T[i] is the i-th row of the tableau, and returns the first BLOCK elements of the result. The verifier applies the extend method to this response, and then verifies that the extended row is consistent with the positions of the Merkle tree that the verifier will later request from the Prover. The Prover's task is therefore to compute a summation. For efficiency, set u[0]=1 because this first row corresponds to a random row meant to ``pad" the witnesses for zero-knowledge.

Linear and Quadratic constraints The linear test is represented by a matrix A, and a vector b, and aims to verify that A*W = b. The constraint matrix A is given as input in a sparse form: it is an array of triples (c,j,k) in which c indicates the constraint number or row of A, j represents the index of the witness or column of A, and k represents the constant factor. For example, if the first constraint (at index 0) is W[2] + 2W[3] = 3, then the linear constraints array contains the triples (0,2,1), (0,3,2) and the b vector has b[0]=3. The quadratic constraints are given as input in an array lqc[] that contains triples (x,y,z); one such triple represents the constraint that W[x] * W[y] = W[z]. To process quadratic constraints, tableau T is augmented with 3 extra rows, called Qx, Qy, and Qz which hold copied witnesses and their products. If the i-th quadratic constraint is (x,y,z), then the prover sets Qx[i] = W[x], Qy[i] = W[y] and Qz[i] = W[x] * W[y]. Next, the prover adds a linear constraint that Qx[i] - W[x] = 0, Qy[i] - W[y] = 0 and Qz[i] - W[z] = 0 to ensure that the copied witness is consistent. In this sense, the quadratic constraints are reduced to linear constraints, and the additional requirement for the verifier to check that each index of the Qz row is the product of its counterpart in the Qx and Qy row.

Selection of challenge indicies The last step of the prove method is for the verifier to select a subset of unique indicies (i.e., they are sampled without replacement) from the range DBLOCK...NCOL and request that the prover open these columns of tableau T. These opened columns are then used to verify consistency with the previous messages sent by the prover.

Ligero Prover procedure

Ligero verification procedure This section specifies how to verify a Ligero proof with respect to a common set of linear and quadratic constraints.

Overview of the Longfellow protocol The Longfellow ZK protocol utilizes two primitive operations. The first is a variant of the sumcheck protocol, modified to support zero knowledge. Informally, the non-padded sumcheck prover takes the description of a circuit and the concrete values of all the wires in the circuit, and produces a proof that all wires have been computed correctly. The proof itself is a sequence of field elements. The padded-variant of the sumcheck prover used in this document also takes as input a random and secret one-time pad and it outputs a "padded" proof such that each element in the padded proof is the difference of the element in the non-padded proof and of the element in the pad. (The choice of "difference" instead of "sum" is a matter of convention.) In this padded sumcheck variant, the verifier cannot check the proof directly, because it cannot access the pad. Instead of running the sumcheck verifier directly, a commitment scheme is used to hide the pad, and the sumcheck verifier is translated into a sequence of linear and quadratic constraints on the inputs and the pad. The commitment scheme then produces a proof that the constraints are satisfied. Some of the wires of the circuit are inputs, i.e., set outside the circuit and not computed by the circuit itself. Some of the inputs are public, i.e., known to both parties, and some are private, i.e., known only to the prover. Sumcheck does not use the distinction between public and private inputs, but this document distinguishes inputs from the pad. On the contrary, the commitment scheme does not use public inputs at all, but it does treat private inputs and the pad equally. These constraints motivate the following terminology.

public inputs: inputs to the circuit known to both parties.
private inputs: inputs to the circuit known to the prover but not to the verifier.
inputs: both public and private inputs. When forming an array of all inputs, the public inputs come first, followed by the private inputs.
witnesses: the private inputs and the pad. When forming an array of all witnesses, the private inputs come first, followed by the pad.

Thus, at a high level, the sequence of operations in the ZK protocol is the following:

The prover commits to all witness values.
The prover runs the padded sumcheck prover on the witness values to producing a padded proof, and sends the padded proof to the verifier.
Both the prover and the verifier take the public inputs and the padded proof and produce a sequence of constraints.
Using the commitment scheme and the witnesses, the prover generates a proof that the constraints from step 3 are satisfied.
The verifier uses the proof from step 4 and the constraints from step 3 to check the constraints.

Steps 2 and 3 are referred to as "sumcheck", and the rest as "commitment scheme". While the classification of step 3 as "sumcheck" is arbitrary, there are situations where one might want to use a commitment scheme other than the Ligero protocol specified in this document. In this case, the "commitment scheme" can change while the "sumcheck" remains unaffected.

Sumcheck

Special conventions for sumcheck arrays The square brackets A[j] denote generic array indexing. For the arrays of field elements used in the sumcheck protocol, however, it is convenient to use the conventions that follow. The sumcheck array A[i] is implicitly assumed to be defined for all nonnegative integers i, padding with zeroes as necessary. Here, "zero" is well defined because A[] is an array of field elements. Arrays can be multi-dimensional, as in the three-dimensional array Q[g, l, r]. It is understood that the array is padded with infinitely many zeroes in each dimension. Given array A[] and field element x, the function bind(A, x) returns the array B such that In case of multiple dimensions such as Q[g, l, r], always bind across the first dimension. For example, This bind can be generalized to an array of field elements as follows: Two-dimentional arrays can be transposed in the usual way:

The EQ[] array EQ_{n}[i, j] is a special 2D array defined as The sumcheck literature usually assumes that n is a power of 2, but this document allows n to be an arbitrary integer. When n is clear from context or unimportant, the subscript is omitted like EQ[i, j]. EQ[] is important because the general expansion commutes with binding, yielding That is, one way to compute bindv(V, X) is via dot product of V with bindv(EQ, X). This strategy may or may not be advantageous in practice, but it becomes mandatory when bindv(V, X) must be computed via a commitment scheme that supports linear constraints but not binding. This document only uses bindings of EQ and never EQ itself, and therefore the whole array never needs to be stored explicitly. For n = 2^l and X of size l, bindv(EQ_{n}, X) can be computed recursively in linear time as bindv(EQ_{n}, X) = bindeq(l, X) where For m <= n, bindv(EQ_{n}, X)[i] and bindv(EQ_{m}, X)[i] agree for 0 <= i < m, and thus bindv(EQ_{m}, X)[i] can be computed by padding m to the next power of 2 and ignoring the extra elements. With some care, it is possible to compute bindeq() in-place on a single array of arbitrary size m and eliminate the recursion completely.

Remark Let m <= n, A = bindv(EQ_{m}, X) and B = bindv(EQ_{n}, X). It is true that A[i] = B[i] for i < m. However, it is also true that A[i] = 0 for i >= m, whereas B[i] is in general nonzero. Thus, care must be taken when computing a further binding bindv(A, Y), which is in general not the same as bindv(B, Y). A second binding is not needed in this document, but certain closed-form expressions for the binding found in the literature agree with these definitions only when m is a power of 2.

Circuits

Layered circuits A circuit consists of NL layers. By convention, layer j computes wires V[j] given wires V[j + 1], where each V[j] is an array of field elements. A wire is an element V[j][w] for some j and w. Thus, V[0] denotes the output wires of the entire circuit, and V[NL] denotes the input wires. A circuit is intended to check that some property of the input holds, and by convention, the check is considered successful if all output wires are 0, that is, if V[0][w] = 0 for all w.

Quad representation The computation of circuit is defined by a set of quads Q[j], one per layer. Given the output of layer j + 1, the output of of layer j is given by the following equation: The quad Q[j][] is thus a three-dimensional array in the indices g, l, and r where 0 <= g < NW[j] and 0 <= l, r < NW[j + 1]. In practice, Q[j][] is sparse. The specification of the circuit contains an auxiliary vector of quantities LV[j] with the property that V[j][w] = 0 for all w >= 2^{LV[j]}. Informally, LV[j] is the number of bits needed to name a wire at layer j, but LV[j] may be larger than the minimum required value.

In-circuit assertions In the libzk system, a theorem is represented by a circuit such that the theorem is true if and only if all outputs of the circuit are zero. It happens in practice that many output wires are computed early in the circuit (i.e., in a layer closer to the input), but because of layering, they need to be copied all the way to output layer in order to be compared against zero. This copy seems to introduce large overheads in practice. A special convention can mitigate this problem. Abstractly, a layer is represented by two quads Q and Z, and the operation of the layer is described by the two equations Thus, the Z quad asserts that, for given layer j and output wire g, a certain quadratic combination of the input wires is zero. The actual protocol verifies a random linear combination of those two equations, effectively operating on a combined quad QZ = Q + beta * Z for some random beta. To allow for a compact representation of the two quads without losing any real generality, the following conditions are imposed:

The two quads Q and Z are disjoint: for all layers j and output wire g, if any Q[j][g, ., .] are nonzero, then all Z[j][g, ., .] are zero, and vice versa.
Z is binary: Z[j][g, l, r] \in {0, 1}

With these choices, the two quads allow a compact sparse representation as a single list of 4-tuples (g, l, r, v) with the following conventions:

If v = 0, the 4-tuple represents an element of Z, and Z[j][g, l, r] = 1.
If v != 0, the 4-tuple represents an element of Q, and Q[j][g, l, r] = v.
All other elements of Q and Z not specified by the list are zero.

Moreover, this compact representation can be transformed into a representation of QZ = Q + beta * Z by replacing all v = 0 with v = beta.

Representation of polynomials In a generic sumcheck protocol, the prover sends to the verifier polynomials of a degree specified in advance. In the present document, the polynomials are always of degree 2, and are represented by their evaluations at three points P0 = 0, P1 = 1, and P2, where 0 and 1 are the additive and multiplicative identities in the field. The choice of P2 depends upon the field. For fields of characteristic greater than 2, set P2 = 2 (= 1 + 1 in the field). For GF(2^128) expressed as GF(2)[X] / (X^128 + X^7 + X^2 + X + 1), and set P2 = X. This document does not prescribe a choice of P2 for binary fields other than GF(2^128), but other binary fields represented as GF(2)[X] / (Q(X)) SHOULD choose P2 = X for consistency.

Transform circuit and wires into a padded proof

Generate constraints from the public inputs and the padded proof This section defines a procedure constraints_circuit for transforming the proof returned by sumcheck_circuit into constraints for the commitment scheme. Specifically, each layer produces one linear constraint and one quadratic constraint. The main difficulty in describing the algorithm is that it operates not on concrete witnesses, but on expressions in which the witnesses are symbolic quantities. Symbolic manipulation is necessary because the verifier does not have access to the witnesses. To avoid overspecifying the exact representation of such symbolic expressions, the convention is that the prefix sym_ indicates not a concrete value, but a symbolic representation of the value. Thus, w[3] is the fourth concrete witness in the w array, and sym_w[3] is a symbolic representation of the fourth element in the w array. The algorithm does not need arbitrarily complex symbolic expressions. It suffices to keep track of affine symbolic expressions of the form k + SUM_{i} a[i] sym_w[i] for some (concrete, nonsymbolic) field elements k and a[].

Serializing objects This section explains how a proof consists of smaller, related objects, and how to serialize each such component. First, the standard methods for serializing integers and arrays are used:

write_size(n): serializes an integer in [0, 2^{24} - 1] that represents the size of an array or an index into an array. The integer is serialized in little endian order.
write_array(arr): A variable-sized array is represented as type array[] and serialized by first writing its length as a size element, and then serializing each element of the array in order.
write_fixed_array(arr): When the length of the array is explicitly known to be n, it is specified as type array[n] and in this case, the array length is not written first.

Serializing structs When a section includes just a struct definition, it is serialized in the natural way, starting from the top-most component and proceeding to the last one, each component is serialized in order.

Serializing Field elements This section describes a method to serialize field elements, particularly when the field structure allows efficient encoding for elements of subfields. Before a field element can be serialized, the context must specify the finite field. In most cases, the Circuit structure will specify the finite field, and all other aspects of the protocol will be defined by this field. A finite field or FieldID is specified using a variable-length encoding. Common finite fields have been assigned special 1-byte codes. An arbitrary prime-order finite field can be specified using the special 0xF_ byte followed by a variable number of bytes to specify the prime in little-endian order. For example, the 3 byte sequence f11001 specifies F₂₅₇. Similarly, a quadratic extension using the polynomial x^2 + 1 can be specified using the 0xE_ designators. Finite field identifiers.

Finite field	FieldID
p256	0x01
p384	0x02
p521	0x03
GF(2¹²⁸)	0x04
GF(2¹⁶)	0x05
2¹²⁸ - 2¹⁰⁸ + 1	0x06
2^64 - 59	0x07
2^64 - 2^32 + 1	0x08
F_{2^64 - 59}²	0x09
secp256	0x0a
F_{2^{0--15}-byte prime}²	0xe{0--f}
F_{2^{0--15}-byte prime}	0xf{0--f}

The GF(2¹²⁸) field uses the irreducible polynomial x¹²⁸ + x⁷ + x² + x + 1. The p256 prime is equal to 115792089210356248762697446949407573530086143415290314195533631308867097853951, which is the base field used by the NIST P256 elliptic curve. The p384 prime is equal to 39402006196394479212279040100143613805079739270465446667948293404245721771496870329047266088258938001861606973112319 which is the base field used by the NIST P384 curve. The p512 prime is equal to 2⁵²¹ - 1. The F_p64^2 field is the quadratic field extension of the base field defined by prime 18446744073709551557 using polynomial x^2 + 1, i.e. by injecting a square root of -1 to the field.

Serializing a single field element Unless specified otherwise, a field element, referred to as an Elt, is serialized to bytes in little-endian order. For example, a 256-bit element of the finite field F_p256 is serialized into 32-bytes starting with the least-significant byte.

write_elt(e, F): produces a byte encoding of a field element e in field F.

Serializing an element of a subfield In some cases, when both Prover and Verifier can explicitly conclude that a field element belongs to a smaller subfield, then both parties can use a more efficient sub-field serialization method. This optimization can be used when the larger field F is a field extension of a smaller field, and both parties can conclude that the serialized element belongs to the smaller subfield.

write_subfield(Elt e, F2, F1): produce a byte encoding of a field element e that belongs to a subfield F2 of field F1.

Serializing a Sumcheck Transcript The padded transcript incorporates the optimization in which the eval at 1 is omitted and reconstructed from the expected value of the previous challenge.

Serializing a Ligero Proof The concept of a run allows saving space when a long run of field elements belong to a subfield of the Finite field. Runs consist of a 4-byte size element, and then size Elt elements that are either in the field or the subfield. Runs alternate, beginning with full field elements. In this way, rows that consist of subfield elements can save space. The maximum run length is set to 2²⁵.

Serializing a Sequence of proofs For the multi-field optimization, the proof string consists of a sequence of two proofs. This is handled by using the circuit identifier to specify the sequence of proofs to parse.

Serializing a Circuit A circuit structure consists of size metadata, a table of constants, and an array of structures that represent the layers of the circuit as follows. The const_table structure contains an array of Elt constants that can be referred by any of the CircuitLayer structures. This feature saves space because a typical circuit uses only a handful of constants, which can be referred by a small index value into this table. The quads array stores the main portion of the circuit. Each Quad structure contains a g, h0, h1 and a constant v which is represented as an index into the const_table array in the Circuit. Each g,h0, and h1 is stored as a difference from the corresponding item in the previous quad. In other words, these three values are delta-encoded in order to improve the compressibility of the circuit representation. The Delta spec uses LSB as a sign bit to indicate negative numbers.

Security Considerations Both the Ligero and Longfellow systems satisfy the standard properties of a zero-knowledge argument system: completeness, soundness, and zero-knowledge. Frigo and shelat provide an analysis of the soundness of the system, as it derives from the Soundness of the Ligero proof system and the sumcheck protocol. Similarly, the zero-knowledge property derives almost entirely from the analysis of Ligero . It is a goal to provide a mechanically verifiable proof for a high-level statement of the soundness.

IANA Considerations This document does not make any requests of IANA.

References Normative References Informative References THE KNOWLEDGE COMPLEXITY OF INTERACTIVE PROOF SYSTEMS Novel polynomial basis and its application to Reed-Solomon erasure codes How to Prove False Statements: Practical Attacks on Fiat-Shamir Ligero: Lightweight Sublinear Arguments Without a Trusted Setup Anonymous credentials from ECDSA Fiat-Shamir From Simpler Assumptions

Acknowledgements

Test Vectors This section contains test vectors. Each test vector in specifies the configuration information and inputs. All values are encoded in hexadecimal strings.

Test Vectors for Merkle Tree

Vector 1

Leaves: 4bf5122f344554c53bde2ebb8cd2b7e3d1600ad631c385a5d7cce23c7785459a dbc1b4c900ffe48d575b5da5c638040125f65db0fe3e24494b76ea986457d986 084fed08b978af4d7d196a7446a86b58009e636b611db16211b65a9aadff29c5 e52d9c508c502347344d8c07ad91cbd6068afc75ff6292f062a09ca381c89e71 e77b9a9ae9e30b0dbdb6f510a264ef9de781501d7b6b92ae89eb059c5ab743db
Root: f22f4501ffd3bdffcecc9e4cd6828a4479aeedd6aa484eb7c1f808ccf71c6e76
Proof for leaves (0,1): 084fed08b978af4d7d196a7446a86b58009e636b611db16211b65a9aadff29c5 f03808f5b8088c61286d505e8e93aa378991d9889ae2d874433ca06acabcd493
Proof for leaves (1,3): e77b9a9ae9e30b0dbdb6f510a264ef9de781501d7b6b92ae89eb059c5ab743db 084fed08b978af4d7d196a7446a86b58009e636b611db16211b65a9aadff29c5 4bf5122f344554c53bde2ebb8cd2b7e3d1600ad631c385a5d7cce23c7785459a

Test Vectors for Circuit

Vector 1

Description: Circuit C(n, m, s) = 0 if and only if n is the m-th s-gonal number in F_p128. This circuit verifies that 2n = (s-2)m^2 - (s - 4)*m.
Field: 2¹²⁸ - 2¹⁰⁸ + 1 (Field ID 6)
Depth: 3 Quads: 11 Terms: 11
Serialization: 01060000010000010000020000040000020000040000ffffffffffffffffffffffffffefffff00000000000000000000000000f0ffff01000000000000000000000000000000fdffffffffffffffffffffffffefffff030000060000030000000000020000000000000000000000080000040000010000000000030000020000020000020000040000080000000000000000000000020000060000000000000000000000040000000000000000030000090000020000000000020000020000020000000000020000020000020000000000020000040000000000000000020000030000030000040000020000

Test Vectors for Sumcheck

Vector 1

Description: Circuit C(n, m, s) = 0 if and only if n is the m-th s-gonal number in F_p128. This circuit verifies that 2n = (s-2)m^2 - (s - 4)*m.
Field: 2¹²⁸ - 2¹⁰⁸ + 1 (Field id 6)
Fiat-Shamir initialized with
Serialization: 90e734c42b5f14ee432a0ed95ba2ada05c3f9ecc9b026ded61f00bf57434f93c6f70e9c8b6e3de005ba8b4da93b5fa35fc3efae1e6068399c7f7d009ab5a2711084c97cd5a6e28dd30c598907b328d81915e487c34dbf80aa5da14f0621011a33d838a7b0d9a03533c63c6606f5360f88cf97c728630afdcb9755894a6f5c9068e1fc29f97efc125ba580de64089c6e72433de2a3267b90daeaf418ac8a3df3bbddc6cb141c764c8262346baac2e28033778b1a71f153ba571e80ab29951f9440ba93fede225a35accf6e0114d5240ae92df02d2870e5258ebba416f3d815e1554b05627998fc9d3bf354b89394b27b39f69c6538dbc968a779369e47f214252e0955624e9f4d6dc2a95cf41c57703b8749b959315458d4076f0daf5fdbde23e16c10394ac884ab9cad0782e8f472cb4edb69682d17465363691aafc31b83cd764fb909b50e2fe907fd2137566ddb8c47cc13974957e7f76180860571035f7a4d2658a82e1be8fe155353bc10feae9541365926f0646b4a5351907cbd5d9dbb4

Test Vectors for Ligero

Vector 1

Description: Circuit C(n, m, s) = 0 if and only if n is the m-th s-gonal number in F_p128. This circuit verifies that 2n = (s-2)m^2 - (s - 4)*m.
Field: 2¹²⁸ - 2¹⁰⁸ + 1 (Field id 6)
Witness vector: [1, 45, 5, 6]
Pad elements: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4]
Parameters:
- NREQ: 6
- RATE: 4
- WR: 20
- QR: 2
- NROW: 7
- NQ: 1
- BLOCK: 51
Commitment: 738d2ffb3a8bf24e7aedb94be59041fb2dc13da30fe6b05ebe5126ef8fc36ec2
Proof size: 3180 bytes
Proof: fa8d88a73b3a0f9c067658c45bb394a602000000000000000000000000000000fa8d8...2cd5f61cd2b2eb84c79e1707cbad0048fcd820c716584f31991cf1628fb041

Test Vectors for libzk