THUDM

slime

AILLMReinforcement LearningMegatronSGLangPost-training

5,113

+340

// summary

Slime is an LLM post-training framework designed for reinforcement learning scaling by integrating Megatron for high-performance training and SGLang for efficient rollout generation. The framework utilizes a data buffer to bridge training and generation, enabling flexible and asynchronous workflows for various model architectures. It has been successfully employed in diverse research and production projects, including physics reasoning, agentic RL, and kernel generation.

// use cases

High-performance RL training by connecting Megatron with SGLang

Flexible data generation workflows using custom interfaces and server-based engines

Support for large-scale model training including Qwen, DeepSeek, and Llama series