Back to browse
GitHub Repository

A Python toolkit for studying AI agent behavior without looking inside the model.

10 starsJupyter Notebook

Aft, a Python toolkit to study agent behavior

by chse_cake·Mar 2, 2026·2 points·0 comments

AI Analysis

●●SolidBig BrainNiche GemRabbit Hole

Behavioral field approximation via trajectory sampling; clever framing, limited adoption signals.

Strengths
  • Original theoretical framing: field as distribution of reachable behaviors conditioned on context
  • 400-line API surface; rest is docs and reasoning—respects craft over bloat
  • Empirical methodology enables hypothesis testing for prompt/tool/environment changes
Weaknesses
  • Hobby project with no examples, no live agents, no community adoption demonstrated
  • Narrow audience: mainly useful for researchers; practitioners want observability, not trajectory analysis
Target Audience

AI researchers, agentic system developers studying agent behavior shifts

Post Description

aft was my stab at having a way to understand what claude is doing and also having the language to reason about differences in model behavior when we make them do long agentic runs / change prompts / alter tools etc. The intention of the toolkit to provide an empirical measure of how agent behavior can differ as things changes like environments, tools, prompts etc.

It gives the tools to measure the changes in "behaviors that the users define". This means that it is more like a hypothesis testing framework for what the agent is doing over actually telling what the agent might do.

The reasoning and derivations behind these tools is given over here https://technoyoda.github.io/agent-science.html

Would be very happy to hear feedback and questions. (Please ignore the names given to theorization, it was for shits and giggles)

Similar Projects

AI/ML●●Solid

Agentify - A Declarative, AI agent building toolkit

Defines agents as YAML specs and provides a simple CLI + Python API so you can iterate on agent behavior without committing to a heavyweight framework. It supports many model providers (OpenAI, Anthropic, Google, Mistral, Bedrock, etc.) and includes Colab examples and a quickstart, which makes experimenting frictionless — but it deliberately avoids workflow/orchestration features, so it’s clearly aimed at prototyping rather than production-grade agent pipelines.

Niche GemShip It
lewissheridan
113mo ago