Inference Mode Reading Group

Next Event

On June 28th, we will meet to discuss the DDTree paper.


Sign up to attend here.


Can't make it? Sign up for our mailing list to hear about future events.

What is inference mode?

Inference Mode is a reading group for performance-focused engineers building inference-heavy systems.

"Inference mode" denotes a state of mind: ruthless optimization of kernels, tireless removal of host-side bottlenecks, unhinged hacks to skip computations or compress tensors, and making GPUs go brrrt.

All to deliver the outputs of large neural networks at scale and under SLO.

It is named by analogy to CUDA MODE and in reference to the @inference_mode decorator that prepares a PyTorch model for inference.

Past Events

On June 7th, we met to discuss Stas Bekman's Open ML Engineering Book.

On May 24th, we met to discuss the DeepSeek-V4 tech report.

On May 3rd, we met to discuss the DFlash paper by Z-Lab.