We built Talos – a full CNN inference engine running on silicon

Name: We built Talos – a full CNN inference engine running on silicon
Availability: InStock
Author: luthiraabeykoon

by luthiraabeykoon·Feb 23, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●●GemWizardryZero to OneBold Bet

CNN inference fully hardcoded as silicon logic, not software optimized for hardware.

Strengths

•Strips runtime overhead entirely—every multiply, buffer, and data path is deterministic digital logic on FPGA, not scheduler-bound.
•Built in two weeks under extreme constraint; demonstrates genuine hardware debugging craft (nanosecond timing closure, waveform hunting).
•Flips the conventional wisdom: hardware accelerators usually adapt software logic, but Talos rethinks inference from the circuit level up.

Weaknesses

•Unclear production viability—two-week timeline and Show HN framing suggest proof-of-concept, not shipping product with real benchmark comparisons.
•No public performance claims against GPU baselines; latency and throughput numbers needed to evaluate practical advantage.

Similar Projects

Hardware●●●●Gem

We built Talos – a full CNN inference engine running on silicon

Strips away PyTorch flexibility entirely; full CNN inference as deterministic hardware logic in SystemVerilog.

WizardryZero to OneBig Brain

luthiraabeykoon

103mo ago

Hardware●●Solid

ML accelerator on a RISC-V FPGA SoC – zero-cycle matmul, boots Linux

Zero-cycle matrix multiplication in combinatorial logic on Lattice ECP5 is genuinely wild.

WizardryBig BrainNiche Gem

dstrbad

402mo ago

Hardware●●Solid

Low-rank approximation for 3x3 FPGA convolutions (33% less DSP usage)

Clever ML+hardware co-design, but a blog post without open-source code, benchmarks, or deployment examples.

WizardryNiche Gem

el_dockerr

113mo ago

Infrastructure●●●Banger

Open-source logic synthesis – formal logic to FPGA

Open-source logic synthesis running on FPGAs when Yosys dominates the space.

Zero to OneWizardryBold Bet

major4x

102mo ago

Data●●Solid

Benchmarking Apple Silicon unified mem for GPU-accelerated SQL analysis

The repo does one practical thing well: quantify the real-world impact of Apple Silicon's unified memory on analytics by running six TPC-H queries plus a GPU-favorable QX and shipping the raw charts and code. It's specific and empirical — you get MLX vs NumPy vs DuckDB numbers and PNGs, not just hand-wavy claims — but it's narrowly scoped to M4 hardware and small-ish scales, so its conclusions are useful for experimentation rather than sweeping generalization.

WizardryNiche Gem

sadopc

313mo ago

AI/ML●●●Banger

RunAnwhere – Faster AI Inference on Apple Silicon

Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.

WizardrySlickZero to One

sanchitmonga22

2401533mo ago