00:00:00

Share Your Feedback 🏝️

PeriFlow

PeriFlow

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: Survey | Quantization Survey Next: Model | LLaMA Pro

PeriFlow

  • Related Project: Private
  • Category: Paper Review
  • Date: 2024-01-04

PeriFlow: How to Serve Large-scale Transformer Models

  • url: https://medium.com/friendliai/orca-how-to-serve-large-scale-transformer-models-b7130e5a9cd6
  • abstract: FriendliAI’s PeriFlow (Orca) is a distributed serving system designed to enhance the efficiency of Transformer-based generative models. It outperforms existing solutions, such as NVIDIA FasterTransformer, with a 36.9X increase in throughput at the same latency level when evaluated on a GPT-3 175B model. PeriFlow employs iteration-level scheduling and selective batching to overcome limitations in existing serving systems, significantly improving resource utilization and responsiveness.

Previous: Survey | Quantization Survey Next: Model | LLaMA Pro

post contain ""

    No matching posts found containing ""