Back to browse
GitHub Repository

Scrape and ingest HKEx (Hong Kong Stock Exchange) regulatory filings into SurrealDB with full-text extraction and graph linking.

8 starsPython

Scrape 25 yrs of HKEx filings into SurrealDB (graph database)

by simonmak·Feb 14, 2026·1 point·0 comments

AI Analysis

MidNiche GemWizardry
The Take

This repo skips brittle browser scraping and hits HKEx's undocumented JSON endpoints to pull decades of filings fast, then runs PDF/HTML/Excel text and table extraction (PyMuPDF + Camelot) and optionally creates graph edges in SurrealDB to connect companies and filings. The engineering choices—batching, parallel downloads, and recursive retries—show it was built for scale rather than a one-off demo. I'd like to see example SurrealDB query patterns or export hooks, but as a bootstrap for financial-data pipelines this cuts a ton of grunt work.

Category
Target Audience

Data engineers, quantitative researchers, fintech developers, and analysts interested in HK market filings

Similar Projects

AI/ML●●Solid

AgentKV – SQLite for AI agent memory (MMAP vector+graph DB)

Single-file mmap storage plus an HNSW vector index and explicit graph edges is an elegant, practical combo — think "SQLite for agent memory" with CRC-32 crash recovery and zero-server convenience. The C++20 core + nanobind gives zero-copy NumPy views and GIL-free searches, and the claimed FAISS-like throughput makes this genuinely interesting for local setups; main caveat is build/toolchain friction and how rich the surrounding ecosystem becomes.

WizardryNiche Gem
shiwang_khera
104mo ago