Back to browse
Architecture question: running an LLM as core infrastructure

Architecture question: running an LLM as core infrastructure

by senza1dio·Mar 14, 2026·1 point·0 comments

AI Analysis

MidShip ItNiche Gem

Industrial chatbot with circuit breakers, but still just a customer service bot at the end.

Strengths
  • Real production deployment with token budgeting and timeout guards for safety
  • Semantic caching with pgvector reduces redundant LLM calls and costs
  • Parallel tool execution with loop detection prevents infinite agent loops
Weaknesses
  • Fundamentally a chatbot interface — no differentiation from Intercom or Drift
  • Architecture questions post suggests unfinished thinking, not a polished product
Category
Target Audience

Developers building production LLM systems with tool orchestration

Similar To

Intercom · Drift · Zendesk Answer Bot

Post Description

I've been experimenting with running an LLM not as a chatbot but as the core runtime of a business system, and I'm curious how others approach this.

The idea is that the model doesn't just answer questions but orchestrates tools and interacts with real application logic.

The architecture I'm currently testing includes:

Runtime

tool orchestration parallel tool execution loop detection circuit breaker / timeout guards token budgeting Context

context compression dynamic token ceiling Caching

deterministic LLM response cache semantic cache using pgvector Memory

short-term session memory longer-term semantic memory Evaluation

prompt evaluation set to test tool reasoning and failures I'm trying to figure out which parts are actually necessary in production and which ones are over-engineering.

For people building LLM systems beyond simple chat interfaces:

how do you handle tool orchestration? do you implement memory layers or just rely on context? are semantic caches worth it in practice? Curious to hear how others structure this.

Similar Projects