TRiP – a complete transformer engine in C built from scratch just by me
From-scratch C transformer engine with training and vision, built by one person.
Official PyTorch implementation of NeuroFlow: EMA-Gated Temporal Sequence Compression for Vision Transformers. Achieves up to 55.8x wall-clock speedup for video inference via semantic surprise routing and a training-free Dual-Memory Reconstruction Protocol.
Training-free dual-memory protocol cuts 1792p SigLIP inference from 678ms to 11.9ms.
ML researchers and engineers working on video inference optimization
Token Merging (ToMe) · DynamicViT · Sparse ViT approaches
From-scratch C transformer engine with training and vision, built by one person.
LibTorch bindings bring CUDA and MPS backends to Java with LLaMA-3 inference included.
30x faster cold start than vLLM with zero PyTorch dependencies.
Automates the painful torch.compile and mixed-precision tuning loop with measured 3x speedups.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.
XAI-driven model improvement loop, but Weights & Biases already tracks experiments better.