Back to browse
GitHub Repository

A laboratory for studying how LLMs behave when offered a set of fake tools

0 starsTypeScript

A laboratory to study how LLMs behave when offered a set of fake tools

by vivganes·Jun 13, 2026·3 points·0 comments

AI Analysis

●●SolidNiche GemBig Brain

Tests whether LLMs will call fake tools like 'slap_bad_human' — clever safety research angle.

Strengths
  • Fake tool library with static and dynamic responses enables controlled behavioral experiments
  • Versioned testing plans with live event streaming for session inspection
  • Self-hostable with Docker, CI pipeline, and test coverage included
Weaknesses
  • Dynamic code execution runs in-process without sandboxing — security risk for shared instances
  • Niche research use case limits broader adoption beyond LLM safety testing
Category
Target Audience

AI researchers, LLM safety testers, developers building tool-using agents

Similar To

LangSmith · Arize Phoenix · Braintrust

Post Description

This is a tool I built to feed my curiosity about how LLMs behave when they see tools. For example, when they see a tool named 'slap_bad_human', will they actually use it, or not?

Similar Projects

AI/ML●●Solid

Layered retrieval beats grep alone for LLM-generated engineering docs

Layered retrieval beats semantic search alone for engineering docs, saving 5x model costs.

Big BrainNiche Gem
rduffyuk
3021d ago