Skip to content
Literature

A decade of verifiable databases, read and annotated

From IntegriDB (2015) to PoneglyphDB (2025), every system that has shaped how we think about cryptographic verifiability for SQL — and what we learned reading them in sequence.

By zkDB Editorial6 min read
A decade of verifiable databases, read and annotated — zero-knowledge database (zkDB) insight

Anyone walking into the zero-knowledge database literature today walks into a field that has been quietly compounding for a decade. The headline systems get the press; the references underneath those papers are the actual map. We read the whole sequence over the winter of 2025. These are the notes.

2015 — IntegriDB lays the foundation

Zhang, Katz, Papamanthou (CCS 2015) introduced the first general system for proving the integrity of SQL queries over an outsourced database. It did not yet hide the data — that property came later — but it established the architectural pattern that every subsequent system has followed: commit to the dataset once, prove queries individually, verify in time sublinear to the data.

The contribution that aged best was not the specific cryptographic construction but the recognition that database operators (projection, filter, equi-join, GROUP BY) compose cleanly into a verifiable pipeline. Every modern zkDB inherits that decomposition.

2017 — vSQL generalizes, and adds zero-knowledge

The same Maryland group followed up with vSQL — verifiable arbitrary SQL queries over dynamic outsourced databases, with a zero-knowledge variant in the companion IACR 2017/1146 paper. This is where the field became zero-knowledge databases in the modern sense: the verifier learns nothing about the underlying rows beyond what the answer itself reveals.

Two technical decisions from vSQL are worth noting. First, the system used interactive proofs (specifically GKR-style sumcheck protocols) — efficient but requiring the prover and verifier to be online together, which limited transferability. Second, the system handled updates to the underlying dataset, anticipating a problem that remains research-grade today.

2021 — VeriDB takes the hardware path

VeriDB (SIGMOD 2021) used Intel SGX enclaves rather than pure cryptography. We include it because it represents the road not taken: enclave-based verifiability is plausible, but it roots the entire trust chain in a hardware vendor. Subsequent SGX vulnerabilities — Foreshadow, ÆPIC, the long list — vindicated the choice the cryptographic side made.

2023 — ZKSQL hits practical TPC-H

Li, Weng, Xu, Wang, Rogers (VLDB 2023) shipped the first system that put the full TPC-H operator set into zero-knowledge with a two-order-of-magnitude speedup over the garbled-circuit baseline. They used VOLE-based interactive proofs via the EMP toolkit — same trade-off as vSQL: fast per-statement, but interactive.

ZKSQL is the moment the field clicked. The benchmarks were on 60k, 120k, 240k row TPC-H instances — small relative to a Fortune 500 ledger, but real. Operator-at-a-time evaluation made the implementation tractable. The paper's ablation studies for each operator are still the most useful single artifact in the literature.

The companion code lives at github.com/vaultdb/zksql. It is worth reading.

2024 — Workshop on Verifiable Database Systems

The first workshop co-located with SIGMOD in 2023 was the field's community-formation moment. The 2024 follow-up is where the gap between academic and applied work narrowed visibly — talks from Northwestern, UC Irvine, Maryland, ECNU, and the Halo2 / Plonky3 implementation teams.

2024 — PoneglyphDB removes the last barrier

Gu, Fang, Nawab (arXiv 2411.15031, SIGMOD 2025) shipped what is, today, the most consequential single paper in the verifiable-database literature. PoneglyphDB delivered the combination that had been missing: non-interactive (no live verifier), transparent setup (no toxic waste, no trusted ceremony), and arbitrary SQL (not just a fixed query family). It is built on PLONKish circuits in the Halo2 framework, using the IPA polynomial commitment scheme.

The architectural moves worth lifting:

  • A small library of basic operation gates (range check, sort, group-by, equi-join, aggregation) composes into any TPC-H query. Future engineers will add gates for vector search, full-text search, recursive CTEs — the architecture admits it.
  • Recursive composition collapses long computations into a single short proof, so verifier cost is logarithmic in the size of the underlying data.
  • Permutation arguments + dummy tuples make joins tractable in zero knowledge, at the cost of padding cardinality to a power of two.

The performance numbers in §5 of the paper are honest about what does and does not work. TPC-H Q1 in seconds; Q5 / Q9 in minutes. Verifier cost sub-second across the board. Proof artifacts in the tens of kilobytes. This is not yet the throughput of a production warehouse, but it is the first system for which a credible engineering roadmap to production is visible.

2025 — The field branches: graphs, AI, provenance

Two papers published in 2025 are early signals of where the field goes next.

Wu et al. (arXiv 2507.00427) extended verifiable querying to graph databases via expansion-centric operator decomposition. The same architectural pattern that worked for SQL works, with care, for graph traversal — and the graph-database market is large enough to absorb a substantial commercial follow-on.

ZKPROV (arXiv 2506.20915) applied the techniques to LLM training-corpus provenance — proving that a specific record was or was not in a model's training set. This is the zk-AI bridge that the EU AI Act will eventually demand at scale.

What we took away

Four observations, in declining order of confidence.

One — the architecture is converging, not diverging. Every system in the sequence (IntegriDB → vSQL → ZKSQL → PoneglyphDB) decomposes the database query into the same basic algebraic operators. The deltas are in the proof system (interactive vs. non-interactive, trusted vs. transparent setup) and the engineering quality of the implementation. The decade has been about removing constraints, not picking sides.

Two — performance is on a steep curve. Halo2 → Plonky3 → SP1 / RISC Zero zkVMs → GPU/FPGA proving — the prover-cost curve has compressed by ~10× per year for three years. We expect this to continue for two more, then plateau as hardware acceleration becomes the dominant variable.

Three — the open problems are mostly engineering, not cryptography. Updates, multi-tenant performance, integration with existing query planners (Postgres, Snowflake), DP composition for sensitive answers, key-management UX for verifiers — these are hard engineering problems with known shapes. They will be solved in industrial settings over the next 24–36 months.

Four — the academic literature is now ahead of the commercial deployments. This is the unusual situation. The crypto-native side (Orochi, Aleo, Space and Time) is shipping, but the enterprise-grade deployment is still wide open. That is the window we work in.

The reading list, if you are starting today

  1. PoneglyphDB (arXiv 2411.15031) — read first for the contemporary architecture.
  2. ZKSQL (VLDB 2023) — for the operator-at-a-time decomposition.
  3. IntegriDB + vSQL (2015, 2017) — for the historical context and the trust-assumption trade space.
  4. PLONK (Gabizon, Williamson, Ciobotaru, 2019) — for the proof system underneath.
  5. Halo2 — for the implementation framework actually shipping today.
  6. Zero Knowledge Canon, Parts 1 & 2 (a16z crypto) — for the broader cryptographic context, accessible to non-cryptographers.

The work compounds. Each paper assumes the previous. Read in order.

Gu, Fang, Nawab. PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification. arXiv:2411.15031, SIGMOD 2025.

Li, Weng, Xu, Wang, Rogers. ZKSQL: Verifiable and Efficient Query Evaluation with Zero-Knowledge Proofs. PVLDB Vol. 16, Issue 8 (2023).

Briefing Notes

Receive the next issue.

One considered email a month, for the people who build and regulate data systems.

One considered email a month, on verifiable data.