518K Vietnamese legal documents (1924–2026)
518k Vietnamese legal documents fill a massive gap in Southeast Asian NLP datasets.
The open-source standard for deterministic process logging in Human-AI collaboration. Moving past AI detection toward transparency
Deterministic audit trail vs. probabilistic detection, but adoption depends on ecosystem buy-in.
Academic researchers, writers, publishers needing verifiable AI attribution without black-box detection
EPUB · PDF metadata standards
Instead of an external model guessing if a text is AI-generated, TWFF is a ZIP-based container (similar to an EPUB) that stores the document alongside a Process Transcript (JSON).
How it works: 1) It captures Revision Velocity: the delta between human drafting and AI injections. 2) It intercepts paste and AI-interaction events, wrapping them in deterministic metadata. 3) It’s local-first. The audit trail stays with the author until they choose to export the signed container.
This is a v0.1 reference implementation built in Python/NiceGUI. I’m looking for feedback on: > The container structure (XHTML vs. Markdown). > The JSON event schema. > The Revision Distance logic: can we create a fingerprint for human effort that is as difficult to fake as the writing itself?
MVP Demo: https://demo.firl.nl/
TWFF spec:https://github.com/Functional-Intelligence-Research-Lab/TWFF...
518k Vietnamese legal documents fill a massive gap in Southeast Asian NLP datasets.
Interactive fiction tooling with portable format—but audience is writers, not developers.
Translates docs in Slack with layout preservation, but native translation covers most needs.
Catches the stale-write bug every AI+database pipeline hits: version checking before mutation.
Git-versioned contracts in Markdown that compile to Word—lawyers can finally use version control.
12ms container startup beats Docker's 500ms with Nix-native declarative config.