We built Talos – a full CNN inference engine running on silicon

Name: We built Talos – a full CNN inference engine running on silicon
Availability: InStock
Author: luthiraabeykoon

by luthiraabeykoon·Feb 23, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●●GemWizardryZero to OneBig Brain

Strips away PyTorch flexibility entirely; full CNN inference as deterministic hardware logic in SystemVerilog.

Strengths

•Radical design philosophy: inverts typical hardware accelerator approach by eliminating runtime/scheduler entirely, achieving true cycle-accurate determinism.
•Technical depth visible in writeup—honest breakdown of hardware debugging constraints (nanosecond timing, waveform analysis) demonstrates genuine craft.
•Solves a real problem: production inference paying hidden PyTorch overhead (dynamic graphs, autograd scheduler) even for frozen models.

Weaknesses

•No quantitative benchmarks against GPU inference or existing FPGA accelerators; latency, throughput, power claims unverified.
•Scope unclear: only CNN architectures, unclear which layer types supported or how memory bandwidth constraints affect real workloads.

Similar Projects

Hardware●●●●Gem

We built Talos – a full CNN inference engine running on silicon

CNN inference fully hardcoded as silicon logic, not software optimized for hardware.

WizardryZero to OneBold Bet

luthiraabeykoon

103mo ago

Hardware●●Solid

ML accelerator on a RISC-V FPGA SoC – zero-cycle matmul, boots Linux

Zero-cycle matrix multiplication in combinatorial logic on Lattice ECP5 is genuinely wild.

WizardryBig BrainNiche Gem

dstrbad

403mo ago

Hardware●●Solid

Low-rank approximation for 3x3 FPGA convolutions (33% less DSP usage)

Clever ML+hardware co-design, but a blog post without open-source code, benchmarks, or deployment examples.

WizardryNiche Gem

el_dockerr

114mo ago

Infrastructure●●●Banger

Open-source logic synthesis – formal logic to FPGA

Open-source logic synthesis running on FPGAs when Yosys dominates the space.

Zero to OneWizardryBold Bet

major4x

102mo ago

Data●●Solid

Benchmarking Apple Silicon unified mem for GPU-accelerated SQL analysis

The repo does one practical thing well: quantify the real-world impact of Apple Silicon's unified memory on analytics by running six TPC-H queries plus a GPU-favorable QX and shipping the raw charts and code. It's specific and empirical — you get MLX vs NumPy vs DuckDB numbers and PNGs, not just hand-wavy claims — but it's narrowly scoped to M4 hardware and small-ish scales, so its conclusions are useful for experimentation rather than sweeping generalization.

WizardryNiche Gem

sadopc

314mo ago

AI/ML●●●Banger

RunAnwhere – Faster AI Inference on Apple Silicon

Custom Metal shaders beat llama.cpp and MLX—1.67x faster on M4 Max.

WizardrySlickZero to One

sanchitmonga22

2401533mo ago