Next Event
Can't make it? Sign up for our mailing list to hear about future events.What is inference mode?
Inference Mode is a reading group for performance-focused engineers building inference-heavy systems.
All to deliver the outputs of large neural networks at scale and under SLO.
It is named by analogy to CUDA MODE and in reference to the @inference_mode decorator that prepares a PyTorch model for inference.
Past Events
On June 7th, we met to discuss Stas Bekman's Open ML Engineering Book.
On May 24th, we met to discuss the DeepSeek-V4 tech report.
On May 3rd, we met to discuss the DFlash paper by Z-Lab.