Back to browse
Optimize_anything: A Universal API for Optimizing Any Text Parameter

Optimize_anything: A Universal API for Optimizing Any Text Parameter

by LakshyAAAgrawal·Feb 20, 2026·8 points·0 comments

AI Analysis

●●●BangerBig BrainZero to OneWizardry

One API unifies prompt tuning, code optimization, and blackbox search—beats domain-specific tools.

Strengths
  • Genuine abstraction: reformulates disparate optimization problems (prompts, CUDA kernels, scheduling) as text search—elegant unification.
  • Actionable Side Information feedback loop: evaluator returns diagnostics (stack traces, rendered images, profiler output) that LLM uses for targeted proposals.
  • Credible validation: beats Optuna, AlphaEvolve, and GPT baseline across agent skills, math reasoning, cost optimization, and mathematical blackbox tasks.
Weaknesses
  • Requires well-defined evaluator function; won't help if measurement is vague or expensive to compute at scale.
  • Early-stage API (v0.1.0); ecosystem and integrations likely thin compared to Optuna, Ray Tune, or prompt engineering platforms.
Category
Target Audience

ML researchers, prompt engineers, system designers optimizing code/configs/agents; anyone with a measurable objective they want to improve.

Similar To

Optuna · OpenEvolve · AlphaEvolve

Post Description

We built optimize_anything, an API that optimizes any artifact representable as text — code, prompts, agent architectures, configs, even SVGs. It extends GEPA (our prompt optimizer, discussed here previously: https://arxiv.org/abs/2507.19457) far beyond prompts.

The API is deliberately minimal. You provide what to optimize and how to measure it:

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> tuple[float, dict]: result = run_my_system(candidate) return result.score, {"error": result.stderr, "runtime": f"{result.time_ms}ms"}

result = oa.optimize_anything( seed_candidate="<your artifact>", evaluator=evaluate, )

The evaluator returns a score plus diagnostic feedback (we call it "Actionable Side Information" — stack traces, rendered images, profiler output, whatever helps diagnose failures). An LLM proposer reads this feedback during a reflection step and proposes targeted fixes, not blind mutations. Candidates are selected via a Pareto frontier across metrics/examples, so a candidate that's best at one thing survives even if its average is mediocre.

Two ideas distinguish this from AlphaEvolve/OpenEvolve/ShinkaEvolve-style LLM evolution: (1) diagnostic feedback is a first-class API concept rather than a framework-specific mechanism, and (2) the API unifies three optimization modes — single-task search (solve one hard problem), multi-task search (solve related problems with cross-transfer), and generalization (build artifacts that transfer to unseen inputs). Prior frameworks only express mode 1.

We tested across 8 domains. Selected results:

Coding agent skills: Learned repo-specific skills push Claude Code to near-perfect task completion and make it 47% faster Cloud scheduling: Discovered algorithms that cut costs 40%, topping the ADRS leaderboard over expert heuristics and other LLM-evolution frameworks Agent architecture: Evolved a 10-line stub into a 300+ line ARC-AGI agent, improving Gemini Flash from 32.5% → 89.5% Circle packing (n=26): Outperforms AlphaEvolve's published solution Blackbox optimization: Generated problem-specific solvers matching or exceeding Optuna across 56 EvalSet problems CUDA kernels: 87% match or beat baseline; multi-task mode outperforms dedicated single-task runs

``` pip install gepa ```

Blog with full results and runnable code for all 8 case studies: https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-o...

GitHub: https://github.com/gepa-ai/gepa

Similar Projects