HubLens › Trending › THUDM/slime
THUDM

slime

AILLMReinforcement LearningMegatronSGLangPost-training
View on GitHub
5,113
+340

// summary

Slime is an LLM post-training framework designed for reinforcement learning scaling by integrating Megatron for high-performance training and SGLang for efficient rollout generation. The framework utilizes a data buffer to bridge training and generation, enabling flexible and asynchronous workflows for various model architectures. It has been successfully employed in diverse research and production projects, including physics reasoning, agentic RL, and kernel generation.

// use cases

01
High-performance RL training by connecting Megatron with SGLang
02
Flexible data generation workflows using custom interfaces and server-based engines
03
Support for large-scale model training including Qwen, DeepSeek, and Llama series