We built an OCR server that can process 270 dense images/s on a 5090
50x faster than PaddleOCR Python with real TensorRT benchmarks.
A lightweight and modular Gumbel MCTS implementation
Validated 2-15X speedup over Alpha Zero baseline with identical policy output.
ML researchers building game AI or reinforcement learning systems
AlphaZero · MuZero · Leela Chess Zero
Over the past few months, I built an efficient MCTS implementation in Python/numba.
As I was building a self-play environment from scratch (for learning purposes), I realized that there were few efficient implementation of this algorithm.
I spent a lot of time validating it against a golden standard baseline [1].
My PUCT implementation is 2-15X faster than the baseline while providing the exact same policy.
I also implemented a Gumbel MCTS, both dense and sparse. The sparse version is useful for games with large action spaces such as chess.
Gumbel makes much better usage of low simulation budgets than PUCT.
Overall, I think this could be useful for the community. I used coding agents to help me along the way, but spent a significant amount of manual work to validate everything myself.
Feedback welcome.
[1] https://github.com/michaelnny/alpha_zero/blob/main/alpha_zer...
50x faster than PaddleOCR Python with real TensorRT benchmarks.
Using an SVO to voxelize Gaussian splats is a sensible way to prune overlap checks — hierarchical voxels fit the problem and should cut costly pairwise collisions. Can't judge the execution: the Reddit thread is blocked with no visible code, benchmarks, or demos, so this currently reads like an intriguing sketch rather than a drop-in tool.
SAE feature explorer, but limited to tweet analysis with unclear research value.
Minimalist cva alternative that splits variants into standalone typed functions.
Per-query α fusion beats fixed hybrid weights on FiQA and FEVER benchmarks.
Native ternary training beats post-training quantization for memory efficiency.