Back to browse
GitHub Repository

Tokenlens is an open-source AI prompt and agent workflow analyzer that finds token waste, repeated context, and prompt caching opportunities to reduce LLM cost and latency.

4 starsPython

CacheLens – Local-first cost tracking proxy for LLM APIs

by stephenlthorn·Mar 13, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemCozy

Local budget caps block requests before provider dashboards even update the bill.

Strengths
  • Budget caps block requests locally instead of just sending email alerts later.
  • Unified view across Anthropic, OpenAI, and Google APIs without sharing your prompts.
  • Zero-config daemon installation sets environment variables automatically for all your SDKs.
Weaknesses
  • LLM observability is crowded with established players like LangSmith and Helicone.
  • Local-only storage means no team collaboration or shared usage analytics across devs.
Target Audience

Developers building LLM applications

Similar To

Helicone · LangSmith · LiteLLM

Post Description

I built CacheLens because I was burning through $200+/month on Claude API calls and had no idea where it was going.

It's a local HTTP proxy that sits between your app and the AI provider (Anthropic, OpenAI, Google). Every request flows through it, and it records token usage, cost, cache hit rates, latency — everything. Then there's a dashboard to visualize it all.

What makes it different from just checking your provider dashboard:

It's real-time (WebSocket live feed of every call as it happens) It works across all three major providers in one view It runs 100% locally — your prompts never leave your machine It has budget caps that actually block requests before you overspend It identifies optimization opportunities (cache misses, model downgrades, repeated prompts) Tech stack: Python, FastAPI, SQLite, vanilla JS. No React, no build step, no external dependencies beyond pip. The whole thing is ~3K lines of Python.

Interesting technical decisions:

The proxy captures streaming responses without buffering — it tees the byte stream so the client sees zero added latency Cost calculation uses a built-in pricing table with override support (providers change rates constantly) There's a Prometheus /metrics endpoint so you can plug it into existing monitoring Cacheability analysis uses diff-based detection across multiple API calls to identify what's actually static vs dynamic in your prompts Limitations I'm honest about:

The cacheability scorer is heuristic-based — solid for multi-call traces (~85% accurate), rougher for single prompts (~65%) Token counting uses cl100k_base for everything, which drifts ~10% for non-OpenAI models Three features (smart routing, scheduled reports, multi-user auth) are on the roadmap but not shipped yet Would love feedback, especially from anyone managing LLM costs at scale.

Similar Projects

Developer Tools●●Solid

Personal AI gateway for OpenClaw – tokenomics

OpenAI-compatible proxy with PII masking and token budgets—but LiteLLM, Helicone already do this.

Solve My ProblemBig Brain
crawdog
203mo ago
Developer Tools●●Solid

TokenMeter – Open-source observability layer for LLM token costs

Proxying every LLM call to log tokens is the right kind of blunt instrument — you get per-developer, per-model cost telemetry immediately. Smart routing and the built-in semantic cache (claims 45–80% savings) are the most useful ideas here, but the default SQLite backend and admin/admin creds scream MVP rather than production-ready scale.

Solve My ProblemNiche Gem
Mohit8880
133mo ago