Back to browse
GitHub Repository

Real-time pattern detection

1 starsPython

Detecting coordinated financial narratives with embeddings and AVX2

by ManuelGomes·Feb 12, 2026·5 points·1 comment

AI Analysis

●●SolidBig BrainWizardryNiche Gem

Embedding-based news coordination detector with AVX2 cosine similarity. Novel domain, but unproven for trading signals.

Strengths
  • AVX2 INT8 quantization for 10× speedup over pure Python baseline
  • Multi-source narrative divergence quantification is genuinely non-obvious
  • Clear pipeline from ingestion through credibility weighting; production-aware design (Docker, SQLite)
Weaknesses
  • No validation that coordination signals predict market moves or detect actual manipulation
  • Depends on NewsAPI coverage; financial relevance unproven in real trading
  • Skepticism warranted: 'coordination score' without ground truth is correlation theater
Category
Target Audience

Quant researchers, financial analysts, market microstructure researchers exploring narrative coordination

Similar To

Refinitiv sentiment analysis · Bloomberg narrative tracking · Academic financial linguistics research

Post Description

I built an open-source system called Horaculo that analyzes coordination and divergence across financial news sources. The goal is to quantify narrative alignment, entropy shifts, and historical source reliability. Pipeline Fetch 50–100 articles (NewsAPI) Extract claims (NLP preprocessing) Generate sentence embeddings (HuggingFace) Compute cosine similarity in C++ (AVX2 + INT8 quantization) Cluster narratives Compute entropy + coordination metrics Weight results using historical source credibility Output structured JSON signals Example Output (query: “oil”) Json Copiar código { "verdict": { "winner_source": "Reuters", "intensity": 0.85, "entropy": 1.92 }, "psychology": { "mood": "Fear", "is_trap": true, "coordination_score": 0.72 } } What it measures Intensity → narrative divergence Entropy → informational disorder Coordination score → cross-source alignment Credibility weighting → historical consensus accuracy per source Performance 1.4s per query (~10 sources) ~100 queries/min ~150MB memory footprint Python-only version was ~12s C++ optimizations: INT8 embedding quantization (4x size reduction) AVX2 SIMD vectorized cosine similarity PyBind11 integration layer Storage SQLite (local memory) Optional Postgres Each source builds a rolling credibility profile: Json Copiar código { "source": "Reuters", "total_scans": 342, "consensus_hits": 289, "credibility": 0.85 } Open Source (MIT) GitHub: [https://github.com/ANTONIO34346/HORACULO] I'm particularly interested in feedback on: The entropy modeling approach Coordination detection methodology Whether FAISS would be a better fit than the current SIMD engine Scalability strategies for 100k+ embeddings

Similar Projects

AI/MLMid

Extract (financial) data from emails with local LLM

Local LLM email parsing when Plaid and receipt scanners already exist.

Ship It
brainless
103mo ago

CLI tool to analyze your Vector Embeddings!

Embedding auditor with 5 checks and pretty plots, but crowded niche with unclear novelty.

Ship It
gauravvij137
213mo ago