GitHub Repository

Agentic synthetic-data generation framework inspired by Meta FAIR's Autodata / Agentic Self-Instruct.

1 starsPython

Autosynth – generating synthetic data with strong/weak model filtering

Name: Autosynth – generating synthetic data with strong/weak model filtering
Availability: InStock
Author: ahmadbabdallah

by ahmadbabdallah·Jul 4, 2026·1 point·0 comments

Visit Project View on HN

Similar Projects

Data●●Solid

MedSynth – Multi-lingual synthetic healthcare data with OCR artifacts

This isn't another clean, English-only faker — it intentionally models script-specific OCR errors (Hebrew/Arabic/Latin confusions), per-hospital schema variance, and country-specific ID formats so models see the sort of mess real systems do. Output is NDJSON and usable from the CLI, which makes it straightforward to plug into pipelines, but the repo looks very new and documentation/examples are thin — promising concept, you’ll still need to tinker to use it at scale.

Niche GemWizardry

Alechko

114mo ago

Developer Tools●Mid