00:00:00

Share Your Feedback 🏝️

Block Diffusion

Block Diffusion

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: Weekly | March Week 3 Next: Flow to the Mode

Block Diffusion

  • Related Project: Private
  • Category: Paper Review
  • Date: 2025-03-18

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

  • url: https://arxiv.org/abs/22503.09573
  • pdf: https://arxiv.org/pdf/22503.09573
  • html: https://arxiv.org/html/22503.09573v1
  • abstract: Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: this https URL
Previous: Weekly | March Week 3 Next: Flow to the Mode

post contain ""

    No matching posts found containing ""