RAG-Ready Extractor – Structure-aware ingestion with semantic scoring
Noise-filtered PDF/web extraction for RAG, but already solved by Jina, Firecrawl.
Local-first hybrid semantic code search tool. Indexes codebases into PostgreSQL with pgvector embeddings via Ollama, combines vector similarity + keyword search with RRF fusion. Supports 30+ languages. Features CLI, MCP server, WEB dashboard and interactive REPL.
Hybrid vector+keyword search beats single-mode retrieval, but Cursor and Cody already own this.
Backend developers, engineering teams managing large codebases
Cursor · Sourcegraph Cody · Continue.dev
Noise-filtered PDF/web extraction for RAG, but already solved by Jina, Firecrawl.
Chat-with-codebase when Cursor, Sourcegraph, Continue already own this space.
AST + embeddings for codebase search—but Sourcegraph Cody, Cursor, and Continue already solve this.
Tree-sitter + FTS5 + MCP = tokens saved for AI agents to actually code, not search.
Honest benchmark shows RAG overhead on trivial queries; 63% token savings on complex tasks.
Memory-aware video chunking with IoU tracking lets SAM 3 run without GPU limits.