Digest AI vs HN About

GitHub Repository

A laboratory for studying how LLMs behave when offered a set of fake tools

0 starsTypeScript

A laboratory to study how LLMs behave when offered a set of fake tools

by vivganes·Jun 13, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemBig Brain

Tests whether LLMs will call fake tools like 'slap_bad_human' — clever safety research angle.

Strengths

•Fake tool library with static and dynamic responses enables controlled behavioral experiments
•Versioned testing plans with live event streaming for session inspection
•Self-hostable with Docker, CI pipeline, and test coverage included

Weaknesses

•Dynamic code execution runs in-process without sandboxing — security risk for shared instances
•Niche research use case limits broader adoption beyond LLM safety testing

Category

Target Audience

AI researchers, LLM safety testers, developers building tool-using agents

Similar To

LangSmith · Arize Phoenix · Braintrust

Post Description

This is a tool I built to feed my curiosity about how LLMs behave when they see tools. For example, when they see a tool named 'slap_bad_human', will they actually use it, or not?

Similar Projects

Data●●Solid

I logged Gemini's stock predictions for 38 days to study LLM drift

Rigorous 38-day Gemini drift study with citation-mapped predictions and confidence scores.

Big BrainRabbit HoleNiche Gem

clsia

513mo ago

Security●●Solid

SEDManager – GUI Application for Setting Up Self-Encrypting Drives

GUI for TCG self-encrypting drives with pre-boot auth, finally usable.

Niche GemSolve My Problem

pregnenolone

102mo ago

Security●●Solid

Hexlock – Replace PII in text with fake data that has the same format

Format-preserving PII replacement lets LLMs process data without seeing real values.

Solve My ProblemShip It

lemaudit

501mo ago

Developer Tools●●Solid

Torrix, self hosted, LLM Observability,(no Postgres, no Redis)

Single Docker container with SQLite beats LangSmith's heavy Postgres dependency.

Solve My ProblemCozy

AdarshRao23

7441mo ago

Health●●Solid

Agent Arnold – Gym tracker 100% vibe-coded from my phone between sets

Dishonored shameboard for fake lifters sets this apart from Hevy and Strong.

Ship ItCozy

bojanstef4

502mo ago

AI/ML●●Solid

Layered retrieval beats grep alone for LLM-generated engineering docs

Layered retrieval beats semantic search alone for engineering docs, saving 5x model costs.

Big BrainNiche Gem

rduffyuk

3021d ago