00:00:00

Share Your Feedback 🏝️

Context Length

Context Length

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: Survey | Mathematical Reasoning of MLLM Next: Attn | Multi-matrix Factorization Attention

Context Length

  • Related Project: Private
  • Category: Paper Review
  • Date:2024-12-22

Bootstrap Your Own Context Length

  • url: https://arxiv.org/abs/2412.18860
  • pdf: https://arxiv.org/pdf/2412.18860
  • abstract: We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only. Our method utilizes a simple agent workflow to synthesize diverse long-context instruction tuning data, thereby eliminating the necessity for manual data collection and annotation. The proposed data synthesis workflow requires only a short-context language model, a text retriever, and a document collection, all of which are readily accessible within the open-source ecosystem. Subsequently, language models are fine-tuned using the synthesized data to extend their context lengths. In this manner, we effectively transfer the short-context capabilities of language models to long-context scenarios through a bootstrapping process. We conduct experiments with the open-source Llama-3 family of models and demonstrate that our method can successfully extend the context length to up to 1M tokens, achieving superior performance across various benchmarks.
Previous: Survey | Mathematical Reasoning of MLLM Next: Attn | Multi-matrix Factorization Attention

post contain ""

    No matching posts found containing ""