HubLens › Trending › deepseek/ai-DeepEP
deepseek

ai-DeepEP

AIDeep LearningGPURDMANVLinkDistributed Training
View on GitHub
9,095
+340

// summary

DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It supports both training and inference workloads by providing advanced features like asymmetric-domain bandwidth forwarding and low-precision operations such as FP8. The library also includes hook-based communication-computation overlapping methods to maximize hardware efficiency without occupying additional Streaming Multiprocessor resources.

// use cases

01
High-throughput MoE dispatch and combine operations for model training and inference prefilling.
02
Low-latency kernels utilizing pure RDMA to minimize delays during inference decoding tasks.
03
Communication-computation overlapping to hide network latency and improve overall system performance.