Back to browse
Running BitNet b1.58 inside DRAM by breaking DDR4 timing rules

Running BitNet b1.58 inside DRAM by breaking DDR4 timing rules

by pcdeni·May 23, 2026·6 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainRabbit Hole

Running AI inference by breaking DDR4 timing rules on off-the-shelf memory is wild.

Strengths
  • Demonstrates compute-in-memory on commercial hardware without silicon changes.
  • Visual explainer makes complex DRAM timing violations accessible.
  • Uncovers previously undocumented DDR behavior via FPGA testing.
Weaknesses
  • Currently too slow for production due to full-row data movement overhead.
  • Requires custom FPGA memory controller, limiting immediate adoption.
Category
Target Audience

Computer architects, hardware researchers, low-level systems engineers

Similar To

Upmem PIM · Samsung HBM-PIM · Mythic AI

Post Description

I have been working on running BitNet b1.58 inside DRAM by intentionally breaking DDR4 timing rules. Also made a visual explainer: https://pcdeni.github.io/CaSA/explainer/ This is tested and works inside commercial off the shelf memory with custom memory controller in the FPGA. The underlying effect is well characterized in academic papers (cmu safari, simra, dram bender, etc). In the process of getting this to work I also made previously undocumented discovery about DDR behaviour: https://pcdeni.github.io/CaSA/explainer/xor-spread.html Overall it is a bit slow, since data (in full rows) needs to be moved even when what is actually needed is only the count of the '1' bits (popcount). To make it competitive memory die changes would be needed, but not as drastic as merging compute and memory into one silicon. This would then avoid the memory wall issue the industry is currently facing.

Similar Projects

Security●●Solid

LLM-Audit – Semgrep Rules for OWASP LLM Top in TypeScript

Fills the TypeScript gap that Semgrep's official AI best practices pack misses.

Niche Gem
Javierlozo
101mo ago