Composable middleware for LLM inference Optimization Passes
Tower-style middleware stacking for inference guardrails beats bolted-on if-statements.
LLM conversation buffer with cache optimization and dynamic context.
Byte-stable prefix pattern achieves >90% cache hits despite dynamic context injection.
LLM application developers, AI agent builders
LangChain · LlamaIndex · LiteLLM
There are a wide range of agent prompting strategies so I'd love to hear where this library works well and where there are patterns that don't fit well into the current API!
Tower-style middleware stacking for inference guardrails beats bolted-on if-statements.
Cache-aware LLM eval with self-hosted model support beats Ragas on flexibility.
Karpathy's LLM-Wiki concept packaged for ChatGPT, Claude, and Gemini exports.
Cuts token bills 68% by swapping full history for vector-retrieved signals.
Multi-tier caching + tree-sitter indexing, but lacks agent autonomy competitors ship today.
O(1) fork latency makes tree search 1000x faster than vLLM for agentic workloads.