Back to browse
GitHub Repository

Evaluation harness for Apodex-1.0 on public deep-research benchmarks.

80 starsPython

Apodex-1.0 – Deep research with independent verifier (90.3 BrowseComp)

by wuqiaocauc·Jun 9, 2026·1 point·0 comments

AI Analysis

●●SolidNiche Gem

90.3 BrowseComp score with verification-centric model architecture.

Strengths
  • Public benchmark results across four deep-research suites with verifiable numbers
  • Multiple model variants from 0.8B to 35B for different compute budgets
Weaknesses
  • This repo is just the evaluation harness, not the actual model weights
  • Deep research agents already have many competing benchmarks and frameworks
Category
Target Audience

AI researchers evaluating deep research models

Similar To

Gaia Benchmark · AgentBench · WebArena

Similar Projects

AI/ML●●Solid

EleutherAI / Lm-Evaluation-Harness

Industry standard benchmark harness refactored with lighter installs and new SGLang support.

Crowd Pleaser
marvinified
1027d ago