A zero-knowledge database has to do something that sounds impossible before you see the machinery: take a relational query — SELECT, JOIN, GROUP BY, SUM — and turn it into a system of polynomial equations whose satisfiability is the correctness of the query. That translation is called arithmetization, and the choice you make at this layer determines whether a verifiable database is a research toy or a production system.
The decisive choice is PLONKish arithmetization, and the reason is custom gates.
The arithmetization problem
Every general-purpose proof system needs a way to express a computation as constraints over a finite field. For a decade, the dominant format was R1CS (Rank-1 Constraint Systems) — every step of the computation reduced to constraints of the form (A · w)(B · w) = (C · w), a single multiplication per row. R1CS is universal, but it is also a straitjacket: a range check, a comparison, a sort — operations that are trivial in a CPU — explode into hundreds or thousands of multiplication constraints.
A database query is full of exactly those operations. WHERE amount BETWEEN x AND y is a range check. GROUP BY is a sort. JOIN is a relational match across two tables. Express these in raw R1CS and the circuit becomes enormous; the prover slows to a crawl.
PLONKish changes the unit of expression.
Why PLONKish, not R1CS
PLONK — and the broader family of "PLONKish" systems it spawned — replaces the fixed (A·w)(B·w)=(C·w) shape with two far more powerful primitives:
- Custom gates. Instead of one multiplication per row, you define an arbitrary low-degree polynomial identity across a configurable window of columns and rows. A single custom gate can encode a whole operation — a 64-bit range decomposition, an accumulator step, a sort comparison — in one constraint family.
- Lookup arguments. You can assert that a value (or tuple of values) appears in a precomputed table, in time independent of the table's size. Range checks, set membership, and bitwise operations that are catastrophic in R1CS become a single lookup.
Together with PLONK's permutation argument (the copy-constraint mechanism that proves two cells hold the same value), these three tools — custom gates, lookups, permutations — are exactly the vocabulary a relational query needs.
| Database need | In raw R1CS | In PLONKish |
|---|---|---|
| Range check (WHERE) | Bit-by-bit decomposition; ~hundreds of constraints | One lookup into a range table |
| Set membership | Linear scan in-circuit | One lookup argument |
| Equality across tables (JOIN) | Manual, costly | Permutation argument |
| Custom per-operator logic | Not expressible directly | A custom gate, by design |
A gate library for SQL
The work of building a zero-knowledge database is, in large part, the work of designing one carefully-tuned gate per relational operator — and then proving each gate sound. This is the firm's craft. The library, at a high level:
WHERE / filters → range-check gates
A predicate like salary BETWEEN 50000 AND 90000 becomes a range constraint. Rather than decomposing each number into bits and constraining each bit (the R1CS approach), the value is checked against a lookup table of the valid range, decomposed into a few limbs. One lookup replaces hundreds of constraints.
Projection → copy constraints
SELECT a, c FROM … carries specific columns forward. The permutation argument proves that the projected cells are exactly the corresponding input cells — no substitution, no fabrication — without re-encoding the data.
Equi-JOIN → permutation + lookup
A join asserts that for every row in the output, matching keys exist in both input tables, and that the output is a faithful combination of the matched tuples. This is expressed as a permutation argument over the key columns plus a lookup that binds matched payloads, with dummy tuples padding the output to a fixed power-of-two cardinality so the circuit shape stays static.
GROUP BY → sort + boundary-marker gate
Grouping is implemented by sorting rows on the group key (a sorting network proven correct in-circuit), then marking the boundary between groups with a custom gate that fires a "new group" flag whenever the key changes. Aggregates then accumulate within each segment.
SUM / COUNT / AVG / MIN / MAX → accumulator gates
Aggregation is a running-accumulator custom gate: a polynomial identity relating each row's accumulator cell to the previous one, reset at group boundaries. AVG composes a SUM accumulator with a COUNT accumulator and a single division check at the end.
ORDER BY → sorting argument
Ordering reuses the same verifiable sorting machinery as GROUP BY: prove that the output is a permutation of the input (permutation argument) and that adjacent rows are correctly ordered (a comparison custom gate).
SQL query plan PLONKish circuit
────────────── ────────────────
Scan(payroll) ──▶ commitment opening
│
Filter(dept='risk') ──▶ range-check gate + lookup
│
Aggregate(AVG) ──▶ sort gate → boundary gate
│ → SUM accumulator
│ → COUNT accumulator → ÷ check
▼
Result + proof ──▶ one short artifact
Halo2 as the substrate
In practice this library is built on Halo2 — the PLONKish framework with first-class support for custom gates, lookup arguments, and a transparent (no trusted-setup) polynomial commitment via Inner Product Arguments, with KZG available where a succinct, constant-size proof is preferred. Halo2's configuration API is what makes a gate library tractable: each gate is declared once as a set of column relationships and selector polynomials, then instantiated wherever a query plan calls for it. Recursive composition (the "Halo" in Halo2) collapses long query circuits into a single short proof.
This is why the modern verifiable-database literature — ZKSQL and especially PoneglyphDB — converges on PLONKish circuits rather than generic SNARK front-ends. The operator-at-a-time decomposition maps cleanly onto a gate-at-a-time circuit.
From query plan to circuit
The pipeline mirrors a traditional database almost exactly:
- The SQL is parsed and planned, just as in Postgres — filters pushed down, joins ordered, aggregates identified.
- Each physical operator in the plan is replaced by its corresponding gate from the library.
- The gates are wired together with permutation arguments so that one operator's output columns are the next operator's input columns.
- The composed circuit is proven once; the verifier checks the single artifact.
A query optimizer composes operators for speed. A zero-knowledge database composes gates for provability — but the structure is the same, which is exactly why an engineering team that knows databases can reason about it.
The trade-offs an architect should know
- Gate count drives proving time. The art is minimizing the number and degree of gates a given query family needs. A poorly-arithmetized
JOINcan dominate the entire circuit. - Lookup tables cost commitment. Large lookup tables (e.g. full 64-bit ranges) are decomposed into smaller limbs to keep the committed table size sane.
- Fixed circuit shape vs. dynamic data. Circuits are static; row counts are padded to powers of two with dummy tuples. Choosing the padding strategy is a real design decision with cost implications.
- Setup model. IPA-based Halo2 needs no trusted setup (better for adversarial/regulated settings); KZG yields smaller proofs but consumes public parameters from a ceremony. The right choice is engagement-specific.
Why this is the enterprise advantage
Here is the part that matters to a CTO. Because each SQL operator is a reusable, independently-proven gate, a zero-knowledge database is not rebuilt from scratch for every query. The gate library is the durable asset; each new regulated workflow — a CCAR aggregate, a HIPAA cohort statistic, a Scope 3 rollup — is a new composition of existing gates, not new cryptography.
This is what makes the bespoke-yet-disciplined engagement model possible: the cryptographic primitives are stable and reviewed; the per-institution work is the arithmetization of that institution's query set onto the shared gate library. The hard, novel, error-prone part — proving each gate sound — is done once and reused. The fitted part — which gates, in which order, against which committed schema — is the engagement.
That division is the whole game. Get the gate library right, and verifiable SQL stops being a research result and becomes infrastructure.
Related
- How a verifiable query actually works — the end-to-end pipeline at a higher altitude.
- What is a zero-knowledge database? — the plain-English foundation.
- The trust topology of a zkDB — who has to trust whom, once the math is in place.
Gu, Fang, Nawab. PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification. arXiv:2411.15031, SIGMOD 2025.
Li, Weng, Xu, Wang, Rogers. ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs. PVLDB Vol. 16 (2023).
Gabizon, Williamson, Ciobotaru. PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge. IACR ePrint 2019/953.