00:00:00

Share Your Feedback 🏝️

Comet

Comet

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: Survey | Scaling Laws Next: Anthropic | Claude 3.7

Comet

  • Related Project: Private
  • Category: Paper Review
  • Date: 2025-03-01

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

  • url: https://arxiv.org/abs/2502.19811
  • pdf: https://arxiv.org/pdf/2502.19811
  • html: https://arxiv.org/html/2502.19811v1
  • abstract: Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost. The development of large MoE models in the distributed scenario encounters the problem of large communication overhead. The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks. Therefore, existing methods suggest the communication in a MoE layer to be pipelined with the computation for overlapping. However, these coarse grained overlapping schemes introduce a notable impairment of computational efficiency and the latency concealing is sub-optimal. To this end, we present COMET, an optimized MoE system with fine-grained communication-computation overlapping. Leveraging data dependency analysis and task rescheduling, COMET achieves precise fine-grained overlapping of communication and computation. Through adaptive workload assignment, COMET effectively eliminates fine-grained communication bottlenecks and enhances its adaptability across various scenarios. Our evaluation shows that COMET accelerates the execution of a single MoE layer by 1.96× and for end-to-end execution, COMET delivers a 1.71× speedup on average. COMET has been adopted in the production environment of clusters with ten-thousand-scale of GPUs, achieving savings of millions of GPU hours.
Previous: Survey | Scaling Laws Next: Anthropic | Claude 3.7

post contain ""

    No matching posts found containing ""