A Flow for PLONK

Update #

Due to personal circumstances, I’m currently unable to devote adequate time for this project. I’ll leave this post up.

Series TL;DR #

PLONKish arithmetizations look like¹ FPGAs made out of polynomials. We have a rectangular array that ingests signals (elements of a finite field) from the outside world, highly-parametrizable computations that get evaluated on subsections of the array, and routing wires (also carrying elements of a finite field) connecting different parts of the array – and cost factors for all of these.

I am going to learn about arithmetic circuits, non-interactive zero-knowledge proving systems, Boolean logic synthesis, and place-and-route methods by trying to create a flow that ingests an arithmetic circuit (in some suitable intermediate representation) and outputs an optimized, placed, and routed version of the circuit, suitable for feeding into a PLONKish proof system like Halo2.

This first post is intended as a high-level summary of FPGA logic synthesis and of non-interactive zero-knowledge proof systems.

Logic synthesis tl;dr #

Logic-synthesis and place-and-route algorithms help FPGA designers convert high[ish] level descriptions of hardware – boolean circuits – into a configuration bitstream that gets shifted into an FPGA’s configure bits. A lot of these optimization problems live in NP, but they are realistically feasible thanks to industry and academic research that’s given the field a lot of clever data structures and algorithms that leverage appropriate local heuristics for the task in question. Lots of CPU time helps too.

A generic FPGA is composed of:

Input/output cells
Routing: a fabric of programmable wires / switch boxes between the logic blocks
Logic blocks, which usually come in multiple flavors and are tesselated in various patterns on the FPGA’s fabric. For instance, a generic FPGA might have:
- 100,000 4-bit LUTs, each of which is a little box that can implement any arbitrary 4-input 1-output boolean function (and is therefore parametrized with 16 configuration bits)
- 100 32-bit multiplier-accumulator blocks (usually² referred to as Digital Signal Processing (DSP) blocks), each of which has a multiplier / accumulator / preadder / etc, to implement rapid convolutions or filters or FFTs. Likewise, each DSP block has a bunch of configuration bits associated with it that will determine how that specific DSP block acts, and it’s the flow’s responsibility to decide what those configuration bits will be.

A generic FPGA flow behaves somewhat like this:

Ingest the hardware description language and convert it to an IR
Simplify the IR
Cut the now-simplified IR into appropriate pieces, and map each of the pieces to logic block types
- What’s an appropriate piece?
- Ideally, each cut corresponds nicely to a specific flavor of logic block that lives on the target FPGA.
- For instance, if your FPGA has lots of 4-bit input LUTs, we’re gonna try and cut up our combinatorial logic into 4-bit input Boolean functions.
- Hard multiplier-accumulate blocks are more efficient than implementing those functions out of LUTs, so there’s code that looks for any section of the IR that can be substituted with appropriate parametrization of a hard DSP block.
  - This doesn’t always work perfectly so there’s available directives in the HDL to explicitly specify that you want a specific type of DSP block with a specific parametrization, rather than using the generic multiply/add operators in the HDL.
Now that we have turned the circuit into a connected graph of logic blocks (along with specific parametrization bits for each specific logic block), we assign a physical location (on the FPGA die) to each logic block in question
Find a configuration for the routing fabric that connects all the logic blocks properly (and connects the input/output cells to the appropriate logic blocks)
Move the logic blocks around on the fabric until the delays in the routing fabric are minimized

Non-interactive zero-knowledge tl;dr #

The mathematical inner workings of non-interactive zero-knowledge proofs are well beyond me at the moment, but I’ll try to explain why they’re useful, describe high-level moving parts, and explain how arithmetic circuits are involved.

It’s easy to prove that a piece of data corresponds to a hash value. You look up the specification of the hash, implement it, and feed the data in. For instance, the SHA1 of “eggs” (without a newline at the end) is bd111dcb4b343de4ec0a79d2d5ec55a3919c79c4.

You can make a committment scheme out of this; by getting some salt (dd if=/dev/urandom bs=1 count=8|xxd -p), pouring the salt on the eggs, and hashing it. SHA1(“eggs c13aa680”) = 0f0feae06f795589c1d22f8e0efd774af631144d. By publishing 0f0feae06f795589c1d22f8e0efd774af631144d, I force myself to commit to “eggs” (and not, say, “lettuce”), because there’s no way I can feasibly come up with a value for salt that’ll make “lettuce [salt]” hash to 0f0feae06f795589c1d22f8e0efd774af631144d. Moreover, my argument that there’s “eggs” in 0f0feae06f795589c1d22f8e0efd774af631144d is ironclad. Anyone who can run and understand SHA1 is convinced.

In order to make a convincing argument about the preimage of a given hash, it seems like I would need to disclose the full preimage. For instance, if I want prove that a certain hash output has a preimage that starts with one of {"apple", "butter", "cheese", "eggs"} (without telling you which one), there’s no straightforward way to show that 14f84ddd83c41687598752c95564e6f5676b8c5c meets the specification: it’s indistinguishable from afba1413f970af24337c61208390a755b6791d4c, which does not.

Ex nihilo, this is indeed the case. A random cryptographic hash by itself is indistinguishable³ from any other random hash. However, there’s a set of generic mathematical constructions (non-interactive zero-knowledge proofs) that allow us to generate a proof (which is really just Special Data with certain properties) – to accompany the cryptographic hash output – and will convince anyone that our secret preimage starts with one of {"apple", "butter", "cheese", "eggs"} without disclosing which one.

An NIZK system has certain properties that make cryptographers happy, like “no information about the secret data can be derived from the proof” and “trying to prove something false or tampering with a valid proof always renders the proof invalid”.

The overall picture is this:

We have some public data and some private data:
- Here, the public data is the hash output.
- Here, the private data is the preimage.
We have a statement, made of a set of constraints, which expresses relations across the ensemble of the public data and the private data:
- Here, constraint 1 is “The secret preimage starts with {"apple", "butter", "cheese", "eggs"}”
- Here, constraint 2 is “Running SHA1 on the secret preimage gives the public hash output”

The proof system (and there are many, they have names like Groth16 and Halo2) is generic for all possible statements that can be expressed in a finite size. You can use a proof system to prove, for instance statements like “the Merkle tree with root hash \(X\) has a leaf that meets a set of properties \(Y\)” without telling anyone which leaf it is!

It’s like a compiler, except instead of running code to produce a value, the proof system produces code that’ll prove/verify that a certain agreed-upon computation got executed faithfully and that certain agreed-upon constraints hold. To do so, you write up your statement as constraints in a specialized constraint language, and in exchange for that hard work, the proof system generates two pieces of software, both of which are generic for all (valid) sets of public data and private data:

a prover, which ingests a specific set of public/private data and outputs a proof that the specific statement holds
a verifier, which ingests a specific set of public data and a proof; and tells us whether the proof of that statement is valid

Why are arithmetic circuits involved? #

The constraints and computations that we want to prove are described (by the user) by a specialized constraint language and get rendered (by the proving system) into a set of arithmetic circuits⁴ with a specific form. The specific form depends on the proving system in question. Certain proving systems use rank-1 constraint system (R1CS), which has a relatively simple form:

The constraint system is composed of multiple constraints and is satisfied by a single solution.
The solution vector \(\vec{s}\) contains the values of all the public, private, and intermediate values in the specific instance we are trying to prove.
Each constraint has three vectors \((\vec{a}, \vec{b}, \vec{c})\)
To satisfy each constraint with the solution vector \(\vec{s}\), we must have \((\vec{s} \cdot \vec{a}) (\vec{s} \cdot \vec{b}) - (\vec{s} \cdot \vec{c}) = 0\).

PLONK-style arithmetization schemes are more complex and in fact look like little integrated circuits (the Halo2 documentation and code explicitly use this metaphor!) or slices of FPGA fabric rather than a grab-bag of variables and quadratic constraints.

Not “look implementable on FPGAs”; though presumably there are clever ways to implement PLONKish proving/verification on FPGAs.↩︎
If you abbreviate “Multiply-ACcumulate” as “MAC” without context it’s confusing because “Media Access Controller” is another thing that lives on some FPGAs, goes by “MAC” too, and there’s no common alternate acronym for those!↩︎
Assuming your hash function isn’t broken, that the inputs aren’t identical, and that you are not an immortal who has counted to \(2^128\).↩︎
Over a large finite field, because this is cryptography↩︎