Back to browse
GitHub Repository

Agent layer for observability

138 starsPython

ML condenses billions of logs into a tiny snapshot your LLM can debug

by kvaranasi_·Jun 17, 2026·7 points·2 comments

AI Analysis

●●SolidSolve My ProblemBig Brain

ML log clustering that sits beside Datadog and Loki without replacing your stack.

Strengths
  • Fingerprints logs into structural templates, condensing millions into actionable patterns.
  • Self-hosted deployment means logs never leave your VPC, no SaaS tier required.
  • Works with existing sources like Loki, CloudWatch, ClickHouse without parallel ingest.
Weaknesses
  • Observability is a crowded category with established players already doing anomaly detection.
  • 17 commits and 138 stars suggests early stage compared to mature observability tools.
Target Audience

Backend developers, DevOps engineers, SREs

Similar To

Datadog · New Relic · Grafana Loki

Post Description

Hi HN, I'm Kaushik, and I built Rocketgraph. I believe that while other spaces have caught up to the AI wave, the observability space is still lagging behind, using the same tools and dashboards that we use to analyse logs from human-written code. But now the code is written and debugged by AI, so we need to rethink how we do observability where the observer itself is an AI.

The problem that I run into is when an alert fires, I have to manually check the Grafana dashboards and write LogQL queries, which is pretty much like greping. But production usually breaks due to a schema mismatch, or a DB connection issue or a log line that I haven't seen before that's buried under millions of log lines. Much worse, the alert never fires, and I don't know when to grep

Rocketgraph fixes that. It turns your logs into patterns by fingerprinting them, then uses ML to anomaly score them by features like frequency, text similarity and other vectors. So, usually this condenses a million logs into 200-300 patterns with anomaly scores and feature vectors that your LLM can easily analyse without sending the entire firehose. This runs at specific points in time, so it's like an online anomaly detection based on logs.

Some companies do anomaly detection on metrics, but this is done for logs.

Other approaches in this space bolt an AI on top of existing Grafana dashboards, but it's the same thing as manually greping with extra steps.

Please check out the example setups to host it locally and run it on your log files. Let me know what you guys think!

Similar Projects