00:00:00

Share Your Feedback 🏝️

Sparse Upcycling

Sparse Upcycling

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: MoE | Sub-1-Bit MoE Next: MoE | Understanding MoE

Sparse Upcycling

  • Related Project: Private
  • Category: Paper Review
  • Date: 2024-01-15

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

  • url: https://arxiv.org/abs/2212.05055
  • pdf: https://arxiv.org/pdf/2212.05055
  • abstract: Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling – a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.

Contents

TL;DR


효율적인 모델 웨이트 확장, 깊이 확장(Depth Up-Scaling, DUS)을 사용하는 SOLAR의 선행 논문인 Sparse Upcycling

  • MoE Layer와 관련된 논문은 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer이며,
  • 본 논문인 Sparse Upcycling과 달리 SOLAR는 MoE를 사용하여 모델을 확장하지 않고 LLM 아키텍처에 맞게 조정된 PMLR에 소개된 EfficientNet과 유사한 깊이별 확장 방법을 사용합니다.
  • DUS는 깊이 차원에 따라 기본 모델을 확장하고 확장된 모델을 지속적으로 pre-training하는 것으로 구성되어 있는 것을 특징으로 합니다.
Previous: MoE | Sub-1-Bit MoE Next: MoE | Understanding MoE

post contain ""

    No matching posts found containing ""