9,095
+340
// summary
DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It supports both training and inference workloads by providing advanced features like asymmetric-domain bandwidth forwarding and low-precision operations such as FP8. The library also includes hook-based communication-computation overlapping methods to maximize hardware efficiency without occupying additional Streaming Multiprocessor resources.
// use cases
01
High-throughput MoE dispatch and combine operations for model training and inference prefilling.
02
Low-latency kernels utilizing pure RDMA to minimize delays during inference decoding tasks.
03
Communication-computation overlapping to hide network latency and improve overall system performance.