Prompts are coupled to LLMs and nobody builds tooling for it

Name: Prompts are coupled to LLMs and nobody builds tooling for it
Availability: InStock
Author: abhishekfordel

by abhishekfordel·Feb 18, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerZero to OneBig Brain

Solves an unsolved problem: format incompatibility between Claude, GPT, LLaMA documented by research.

Strengths

•Identifies genuine gap: 11 tools checked, none adapt prompt format per model
•Two-pass pipeline (deterministic + semantic) is clever without being overengineered
•Backed by published research (arXiv preprint) on prompt coupling phenomenon

Weaknesses

•Tooling is early: 9 commits, minimal production validation
•Depends on local Ollama for semantic pass—adds friction vs simple proxy

Post Description

I went down a rabbit hole trying to understand why my Claude prompts turn to garbage on GPT-4 and vice versa. Not just "slightly worse" — fundamentally broken. Turns out researchers have already measured this: removing colons from a prompt template swings LLaMA-2-13B accuracy by 78 percentage points (Sclar et al., ICLR 2024). The format that works best on one model family overlaps less than 20% with what works best on another (He et al. 2024).

So I went looking for a tool that handles this. Checked DSPy, Guidance, Outlines, PromptLayer, LMQL, Braintrust, Humanloop, Maxim, MLflow, Prompty, Promptomatix. Eleven tools. Zero of them adapt input prompt format per model. They all either optimize what the prompt says or constrain what the model outputs. The actual structural packaging of the input? Manual everywhere.

Then I looked at how production tools deal with it today. Aider has a 2,718-line YAML file with 313 model configs. Some models get "you NEVER leave comments without implementing" and Claude gets the literal opposite instruction. Claude Code only works with Anthropic models — third parties have built LiteLLM proxies and Node.js fetch interceptors to hack around it. Cursor's docs say "switch to a different model and try again."

The paper maps this to Constantine's coupling taxonomy from 1974 (content, common, control, stamp, data coupling). Same structural problem, different domain. I called it "prompt coupling" because that's what it is — your prompt is coupled to your model the same way a module can be coupled to another module's internals.

Also built promptc (https://github.com/shakecodeslikecray/promptc) — transparent HTTP proxy, rewrites prompt structure per model, zero code changes to existing tools. It's a proof of concept, not a product. The paper is the actual contribution.

First paper. Independent researcher. If the framing is wrong, I'd rather hear it here than after it's indexed.