Back to browse
GitHub Repository

Zero-dependency PII + quality + noise audit for LLM datasets (TR/EU/US)

2 starsPython

Flexorch-audit – quality scoring and PII detection for LLM pipelines

by flexorch·Jun 18, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemCozy

Regex-only PII detection with zero dependencies when Presidio already exists.

Strengths
  • 30+ PII types across 8 countries with checksum validation, not just pattern matching
  • Composite quality grade (A/B/C/D) gives instant LLM-readiness signal
  • Pure stdlib means no supply chain risk for sensitive data processing
Weaknesses
  • Regex-based detection will miss contextual PII that NLP models catch
  • No cloud API option for teams wanting managed compliance scanning
Category
Target Audience

ML engineers building RAG pipelines or fine-tuning datasets

Similar To

Microsoft Presidio · Amazon Comprehend · Privacera

Similar Projects

AI/MLMid

Pipevals – a visual pipeline builder for evaluation-driven AI

Early learning project in a crowded eval space dominated by LangSmith and Arize.

Ship ItBold Bet
tilt
623mo ago