abstract: FriendliAI’s PeriFlow (Orca) is a distributed serving system designed to enhance the efficiency of Transformer-based generative models. It outperforms existing solutions, such as NVIDIA FasterTransformer, with a 36.9X increase in throughput at the same latency level when evaluated on a GPT-3 175B model. PeriFlow employs iteration-level scheduling and selective batching to overcome limitations in existing serving systems, significantly improving resource utilization and responsiveness.