Back to browse
LLMxRay an open-source observability tool for LLMs

LLMxRay an open-source observability tool for LLMs

by lognebudo·Mar 18, 2026·1 point·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Multilingual tokenization comparison across Arabic, Chinese, French that LangSmith ignores.

Strengths
  • Side-by-side token streaming reveals how different models tokenize the same prompt
  • Tool Workshop generates and tests function code with full agent graph visibility
  • 100% local with browser-native storage — no telemetry or cloud calls
Weaknesses
  • LLM observability is crowded with LangSmith, Arize Phoenix, and Helicone
  • Early stage with limited model provider integrations beyond Ollama
Category
Target Audience

LLM developers and prompt engineers

Similar To

LangSmith · Arize Phoenix · Helicone

Post Description

LLMxRay is an open-source tool to inspect how different LLMs handle the same prompt. It focuses on three things: • showing prompts and responses side by side for multiple models • exposing token counts and tokenization details • comparing behavior across languages and model families It works with local models (e.g. via Ollama/LM Studio) and API-based models. The interface lets you run the same prompt against several models and see how length, phrasing, and token usage differ. LLMxRay currently supports four languages (English, French, Arabic, Chinese) so you can see how tokenization and expression change across writing systems (Latin, RTL Arabic, and Chinese characters). This makes it useful for understanding multilingual behavior, cost differences, and prompt design across languages. The project is early but usable. I’d be interested in feedback on the concept, the UI, and what kinds of comparisons or visualizations would be most useful to you.

GitHub: https://lognebudo.github.io/llmxray/

Similar Projects

SaaS●●Solid

Read-only LLM cost observability

The read-only correlation of request → model → token → $ is smart — you get a shot at answering 'why did our bill spike?' without routing traffic through a proxy. It also claims workflow-level analysis (long prompts, retries, agent loops) and concrete recommendations like model routing and context trimming; useful, but the pitch leaves open how reliable the heuristics and team-attribution are across providers.

Solve My ProblemSlick
jappleseed987
203mo ago