How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs
Duplicating transformer layers boosts benchmark scores without a single step of training.
AI and machine learning projects from Show HN — LLM tools, agent frameworks, computer vision, NLP, and more.
Duplicating transformer layers boosts benchmark scores without a single step of training.
First LLM with per-token interpretability tracing input, concepts, and training provenance.
Full transformer with backpropagation running in HyperCard on a 1989 Mac — 1,216 parameters, all inspectable.
SETI@home for LLMs where agents coordinate hyperparameter searches across volunteer GPUs.
Streams LLM weights from CD-ROM during inference to fit 77MB models in 32MB RAM.
Agents fail completely at rebuilding binaries from scratch without source code.
Real-robot production benchmarks proving AI is still 20x slower than humans.
Runtime safety net for LLM agents. Detects token spirals, kills doomed tasks early, tells you exactly why. Rust core, Python SDK. pip install state-harness
Lyapunov stability theory catches token spirals before your budget explodes.
Formally verifies ResNet and ViT architectures using Lean 4 proofs.
Seven-pass enrichment pipeline solves character consistency across 100+ generated pages.
Census-grounded synthetic people living in real-time—why didn't this exist before?
Intel TDX attestation proves the agent runs unmodified inside a secure enclave.
Public live feed of an autonomous Lean 4 proof attempt on Ramsey numbers.
Cross-verifies across multiple sources before the LLM sees context — stops hallucinations at the source.
Semantic primitives show up in activation patterns across Qwen, Gemma, LLaMA, SmolLM2.
Fired an AI CTO for lying—file-based memory enforces real institutional accountability.
100% sycophancy detection on Psychosis-Bench, runs locally on gaming GPU.
Tamper-proof memory + cryptographic audit trail for AI agents. HIPAA, SOC2, GDPR compliance built-in. Trust score for every response. Python & TypeScript SDKs. Rust-powered.
Content-addressed memory + Merkle-chained ops = tamper-proof AI agent audit trail.
I-Driven Topological Optimization of Elastocaloric Metamaterials: Resolving the Fatigue-Porosity Paradox in Solid-State Cooling
AI-driven lattice design circumvents SIMP's degeneracy, solving a real physics paradox.
20x faster MoE inference on existing hardware with hash-verified output correctness.
7MB binary-weight LLM runs entirely on integer math with no floating point unit.
AI wrote meta-commentary about other AIs performing an unscripted play—genuinely unprecedented.
Recovers Newton's gravity from raw signal prediction using a bandwidth-limited GRU.
Distilled Gemini tool-calling into a 26M model that runs at 1200 tok/s on phones.
Unlocks Apple's locked LLM with OpenAI-compatible server for existing SDKs.
SOTA expressivity at 14M parameters beats cloud models for on-device TTS.
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.
Direct video-to-vector embedding skips transcription entirely—Twelve Labs but self-hosted.
1-bit weights matching 8B model performance while running 132 tokens/sec on M4 Pro.
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
Beats Whisper v3 accuracy on $100K budget; shipping on six platforms now.
On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.
Runs Gemma 4 E2B and Kokoro TTS locally with barge-in and vision.
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.
Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.
Only Apple Silicon toolkit streaming GCS data during audio fine-tuning without OOM.
Screeps-style RTS where LLMs code their way to victory, real iterative learning.
Runs PPO training entirely in-browser via TinyJit WebGPU kernels.
Runs 4B-parameter image-to-3D on Mac without CUDA—Microsoft's original required NVIDIA only.
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
Background macOS automation without cursor theft beats VM workarounds.
Runs Apple's 2.4GB SHARP model entirely in-browser using ONNX Runtime Web.
TurboQuant WASM SIMD vector compression — 3 bits/dim with fast dot product. Requires relaxed SIMD (Chrome 114+, Firefox 128+, Safari 18+, Node 20+)
Google's ICLR 2026 quantization paper running client-side with SIMD-accelerated dot products.
50-token compact code output beats raw 5,000-token Excalidraw JSON — clever compression.
Nightly REM sleep pipeline consolidates AI memory without a database.
Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine.
Local LLM agent with DOM tools running entirely in-browser via WebGPU.
Solves entity resolution across Salesforce and Zendesk so agents stop hallucinating relationships.
Deterministic browser automation. Works out of the box with Claude/Codex/OpenCode
Forked Chromium to freeze execution state—solves the stale-state problem that breaks most browser agents.
Git as the agent standard — version prompts like code before LangChain locks you in.
Browser Harness | Self-healing harness that enables LLMs to complete any task.
Agent edits helpers.py mid-task while LangChain locks you into predefined tools.
Single Rust binary, zero runtime deps, self-extending skills, local or routed LLMs.
Biologically-inspired memory for AI agents. Decay, retrieval strengthening, consolidation. Zero dependencies.
Biological decay mechanics beat vector search for agent memory that actually forgets.
2840 projects