Skip to content
Concept

Custom gates for SQL: how a query becomes a circuit

Why PLONKish arithmetization — not generic R1CS — is the right substrate for verifiable databases, and how each SQL operator compiles to a hand-tuned custom gate in Halo2.

Published 2026-05-29· 8 min read

A zero-knowledge database has to do something that sounds impossible before you see the machinery: take a relational query — SELECT, JOIN, GROUP BY, SUM — and turn it into a system of polynomial equations whose satisfiability is the correctness of the query. That translation is called arithmetization, and the choice you make at this layer determines whether a verifiable database is a research toy or a production system.

The decisive choice is PLONKish arithmetization, and the reason is custom gates.

The arithmetization problem

Every general-purpose proof system needs a way to express a computation as constraints over a finite field. For a decade, the dominant format was R1CS (Rank-1 Constraint Systems) — every step of the computation reduced to constraints of the form (A · w)(B · w) = (C · w), a single multiplication per row. R1CS is universal, but it is also a straitjacket: a range check, a comparison, a sort — operations that are trivial in a CPU — explode into hundreds or thousands of multiplication constraints.

A database query is full of exactly those operations. WHERE amount BETWEEN x AND y is a range check. GROUP BY is a sort. JOIN is a relational match across two tables. Express these in raw R1CS and the circuit becomes enormous; the prover slows to a crawl.

PLONKish changes the unit of expression.

Why PLONKish, not R1CS

PLONK — and the broader family of "PLONKish" systems it spawned — replaces the fixed (A·w)(B·w)=(C·w) shape with two far more powerful primitives:

  • Custom gates. Instead of one multiplication per row, you define an arbitrary low-degree polynomial identity across a configurable window of columns and rows. A single custom gate can encode a whole operation — a 64-bit range decomposition, an accumulator step, a sort comparison — in one constraint family.
  • Lookup arguments. You can assert that a value (or tuple of values) appears in a precomputed table, in time independent of the table's size. Range checks, set membership, and bitwise operations that are catastrophic in R1CS become a single lookup.

Together with PLONK's permutation argument (the copy-constraint mechanism that proves two cells hold the same value), these three tools — custom gates, lookups, permutations — are exactly the vocabulary a relational query needs.

Database needIn raw R1CSIn PLONKish
Range check (WHERE)Bit-by-bit decomposition; ~hundreds of constraintsOne lookup into a range table
Set membershipLinear scan in-circuitOne lookup argument
Equality across tables (JOIN)Manual, costlyPermutation argument
Custom per-operator logicNot expressible directlyA custom gate, by design
The same query is an order of magnitude cheaper to prove when each operator has a purpose-built gate.

A gate library for SQL

The work of building a zero-knowledge database is, in large part, the work of designing one carefully-tuned gate per relational operator — and then proving each gate sound. This is the firm's craft. The library, at a high level:

WHERE / filters → range-check gates

A predicate like salary BETWEEN 50000 AND 90000 becomes a range constraint. Rather than decomposing each number into bits and constraining each bit (the R1CS approach), the value is checked against a lookup table of the valid range, decomposed into a few limbs. One lookup replaces hundreds of constraints.

Projection → copy constraints

SELECT a, c FROM … carries specific columns forward. The permutation argument proves that the projected cells are exactly the corresponding input cells — no substitution, no fabrication — without re-encoding the data.

Equi-JOIN → permutation + lookup

A join asserts that for every row in the output, matching keys exist in both input tables, and that the output is a faithful combination of the matched tuples. This is expressed as a permutation argument over the key columns plus a lookup that binds matched payloads, with dummy tuples padding the output to a fixed power-of-two cardinality so the circuit shape stays static.

GROUP BY → sort + boundary-marker gate

Grouping is implemented by sorting rows on the group key (a sorting network proven correct in-circuit), then marking the boundary between groups with a custom gate that fires a "new group" flag whenever the key changes. Aggregates then accumulate within each segment.

SUM / COUNT / AVG / MIN / MAX → accumulator gates

Aggregation is a running-accumulator custom gate: a polynomial identity relating each row's accumulator cell to the previous one, reset at group boundaries. AVG composes a SUM accumulator with a COUNT accumulator and a single division check at the end.

ORDER BY → sorting argument

Ordering reuses the same verifiable sorting machinery as GROUP BY: prove that the output is a permutation of the input (permutation argument) and that adjacent rows are correctly ordered (a comparison custom gate).

A query plan compiles to a composition of pre-proven SQL gates — the same way a query optimizer composes physical operators.

Halo2 as the substrate

In practice this library is built on Halo2 — the PLONKish framework with first-class support for custom gates, lookup arguments, and a transparent (no trusted-setup) polynomial commitment via Inner Product Arguments, with KZG available where a succinct, constant-size proof is preferred. Halo2's configuration API is what makes a gate library tractable: each gate is declared once as a set of column relationships and selector polynomials, then instantiated wherever a query plan calls for it. Recursive composition (the "Halo" in Halo2) collapses long query circuits into a single short proof.

This is why the modern verifiable-database literature — ZKSQL and especially PoneglyphDB — converges on PLONKish circuits rather than generic SNARK front-ends. The operator-at-a-time decomposition maps cleanly onto a gate-at-a-time circuit.

From query plan to circuit

The pipeline mirrors a traditional database almost exactly:

  1. The SQL is parsed and planned, just as in Postgres — filters pushed down, joins ordered, aggregates identified.
  2. Each physical operator in the plan is replaced by its corresponding gate from the library.
  3. The gates are wired together with permutation arguments so that one operator's output columns are the next operator's input columns.
  4. The composed circuit is proven once; the verifier checks the single artifact.

A query optimizer composes operators for speed. A zero-knowledge database composes gates for provability — but the structure is the same, which is exactly why an engineering team that knows databases can reason about it.

The trade-offs an architect should know

  • Gate count drives proving time. The art is minimizing the number and degree of gates a given query family needs. A poorly-arithmetized JOIN can dominate the entire circuit.
  • Lookup tables cost commitment. Large lookup tables (e.g. full 64-bit ranges) are decomposed into smaller limbs to keep the committed table size sane.
  • Fixed circuit shape vs. dynamic data. Circuits are static; row counts are padded to powers of two with dummy tuples. Choosing the padding strategy is a real design decision with cost implications.
  • Setup model. IPA-based Halo2 needs no trusted setup (better for adversarial/regulated settings); KZG yields smaller proofs but consumes public parameters from a ceremony. The right choice is engagement-specific.

Why this is the enterprise advantage

Here is the part that matters to a CTO. Because each SQL operator is a reusable, independently-proven gate, a zero-knowledge database is not rebuilt from scratch for every query. The gate library is the durable asset; each new regulated workflow — a CCAR aggregate, a HIPAA cohort statistic, a Scope 3 rollup — is a new composition of existing gates, not new cryptography.

This is what makes the bespoke-yet-disciplined engagement model possible: the cryptographic primitives are stable and reviewed; the per-institution work is the arithmetization of that institution's query set onto the shared gate library. The hard, novel, error-prone part — proving each gate sound — is done once and reused. The fitted part — which gates, in which order, against which committed schema — is the engagement.

That division is the whole game. Get the gate library right, and verifiable SQL stops being a research result and becomes infrastructure.

Gu, Fang, Nawab. PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification. arXiv:2411.15031, SIGMOD 2025.

Li, Weng, Xu, Wang, Rogers. ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs. PVLDB Vol. 16 (2023).

Gabizon, Williamson, Ciobotaru. PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge. IACR ePrint 2019/953.