I logged Gemini's stock predictions for 38 days to study LLM drift
Rigorous 38-day Gemini drift study with citation-mapped predictions and confidence scores.
A laboratory for studying how LLMs behave when offered a set of fake tools
Tests whether LLMs will call fake tools like 'slap_bad_human' — clever safety research angle.
AI researchers, LLM safety testers, developers building tool-using agents
LangSmith · Arize Phoenix · Braintrust
Rigorous 38-day Gemini drift study with citation-mapped predictions and confidence scores.
GUI for TCG self-encrypting drives with pre-boot auth, finally usable.
Format-preserving PII replacement lets LLMs process data without seeing real values.
Single Docker container with SQLite beats LangSmith's heavy Postgres dependency.
Dishonored shameboard for fake lifters sets this apart from Hevy and Strong.
Layered retrieval beats semantic search alone for engineering docs, saving 5x model costs.