Back to browse
I built a Harvey-style tabular review app, then open sourced the code

I built a Harvey-style tabular review app, then open sourced the code

by afistfullof·Apr 9, 2026·4 points·0 comments

AI Analysis

MidBig BrainNiche Gem

Encoder-based extraction guarantees zero hallucinations compared to Harvey's generative approach.

Strengths
  • Architectural choice of encoder models eliminates generative hallucination risks entirely.
  • Provides grounded extraction with confidence scores for every annotation.
  • Detailed guide enables replication without relying on black-box generative APIs.
Weaknesses
  • Requires Isaacus API keys, making it a lead gen tool rather than standalone OSS.
  • Legal tabular review is a narrow workflow compared to general document analysis.
Category
Target Audience

Legal tech developers and lawyers needing accurate document review

Similar To

Harvey · Legora · Clio

Post Description

I spent the past couple of weekends building an open-source alternative to Harvey/Legora's popular tabular review application for lawyers.

The project was sparked by a viral LinkedIn post from lawyer Joshua Upin, who described being shown a hallucinated citation by Harvey that was falsely attributed to one of Harvey’s competitors. Seeing such a basic failure emerge from their architecture made me ask a simple question: could I recreate a similar product of theirs without using a single generative model, and in doing so make hallucinations architecturally impossible?

As it turns out, quite a lot.

In building the app, I did not use a single external or generative model. The entire system uses models my organisation trained and owns. More specifically, it uses a combination of Kanon 2 Enricher, Kanon 2 Embedder, and Kanon Answer Extractor. All three are encoder-based, and there are no generative models anywhere in the stack.

That means hallucinations are architecturally impossible. It also means the system can retrieve, classify, extract, and link information in a much more structured and interactive way than products that lean heavily on generation.

At its core, the app turns contracts into a wiki-style, interconnected knowledge graph: a network of entities, annotations, spans, and relations that users can explore interactively. Key features like parties, locations, dates, signatures, and terms are extracted on the first pass. From there, users can define custom spans and relations, extending the graph as they go.

The end result is a tabular review system that matches the core experience offered by the market and, in several meaningful respects, goes beyond it.

I embedded a static version of the app at the top of the linked page so people can try it directly. The static version has real public contracts processed using the application. These contracts relate to public figures like Mark Zuckerberg, Elon Musk, and Jensen Huang, making it easy to verify the accuracy of the stack. The linked page also works as a step-by-step guide for anyone who wants to build something similar themselves.

Similar Projects

OpenRevise is the Harvey for all industries

The repo nails the governance bits: MECE decomposition, a strict source‑gate, and JSON patch specs so changes are only made when verifiable fulltext exists. It emits true DOCX tracked edits and a Q→source audit mapping — exactly the kind of deterministic audit trail regulated teams want — but the project is still early (few stars, light demos) and it’s unclear how it integrates with verification or LLM orchestration out of the box.

Niche GemSolve My Problem
alfredray
303mo ago
AI/ML●●●Banger

Lavern: an open-source multi-agent legal system (Apache 2.0)

67 agents debate documents with 10-pass verification when single-LLM wrappers dominate.

Big BrainWizardry
anttihero
429d ago