Detecting coordinated financial narratives with embeddings and AVX2

Name: Detecting coordinated financial narratives with embeddings and AVX2
Availability: InStock
Author: ManuelGomes

by ManuelGomes·Feb 12, 2026·5 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidBig BrainWizardryNiche Gem

Embedding-based news coordination detector with AVX2 cosine similarity. Novel domain, but unproven for trading signals.

Strengths

•AVX2 INT8 quantization for 10× speedup over pure Python baseline
•Multi-source narrative divergence quantification is genuinely non-obvious
•Clear pipeline from ingestion through credibility weighting; production-aware design (Docker, SQLite)

Weaknesses

•No validation that coordination signals predict market moves or detect actual manipulation
•Depends on NewsAPI coverage; financial relevance unproven in real trading
•Skepticism warranted: 'coordination score' without ground truth is correlation theater

Post Description

I built an open-source system called Horaculo that analyzes coordination and divergence across financial news sources. The goal is to quantify narrative alignment, entropy shifts, and historical source reliability. Pipeline Fetch 50–100 articles (NewsAPI) Extract claims (NLP preprocessing) Generate sentence embeddings (HuggingFace) Compute cosine similarity in C++ (AVX2 + INT8 quantization) Cluster narratives Compute entropy + coordination metrics Weight results using historical source credibility Output structured JSON signals Example Output (query: “oil”) Json Copiar código { "verdict": { "winner_source": "Reuters", "intensity": 0.85, "entropy": 1.92 }, "psychology": { "mood": "Fear", "is_trap": true, "coordination_score": 0.72 } } What it measures Intensity → narrative divergence Entropy → informational disorder Coordination score → cross-source alignment Credibility weighting → historical consensus accuracy per source Performance 1.4s per query (~10 sources) ~100 queries/min ~150MB memory footprint Python-only version was ~12s C++ optimizations: INT8 embedding quantization (4x size reduction) AVX2 SIMD vectorized cosine similarity PyBind11 integration layer Storage SQLite (local memory) Optional Postgres Each source builds a rolling credibility profile: Json Copiar código { "source": "Reuters", "total_scans": 342, "consensus_hits": 289, "credibility": 0.85 } Open Source (MIT) GitHub: [https://github.com/ANTONIO34346/HORACULO] I'm particularly interested in feedback on: The entropy modeling approach Coordination detection methodology Whether FAISS would be a better fit than the current SIMD engine Scalability strategies for 100k+ embeddings