[프롬프트 분류 분포 색인마킹]
Contents
Alpaca 모델의 수학적 구조와 훈련
Alpaca 7B 모델은 LLaMA 7B, 즉, 대규모 언어모델을 기반으로 합니다. LLaMA 모델은 트랜스포머 아키텍처를 사용하여 여러 개의 attention 및 feed-forward 레이어를 포함합니다.
모델 아키텍처: 트랜스포머는 입력 시퀀스 \(x_1, x_2, ..., x_n\)에 대해 다음과 같은 연산을 수행합니다.
\[\text{Attention}(Q, K, V) = \text{softmax}\left( \text{QK^T}{\sqrt{d_k}}\right)V\]\(Q, K, V\)는 각각 쿼리, 키, 밸류 행렬이며, \(d_k\)는 키 벡터의 차원
데이터 생성: instruction following 시연 데이터는 text-davinci-003을 사용하여 생성되었으며, 다음과 같은 수학적 모델을 사용
\[p(y\\|x) = \prod_{t=1}^{T} p(y_t \\| y_{<t}, x)\]\(x\)는 입력 지시 사항이고, \(y\)는 모델이 생성한 출력
파인튜닝: 주어진 \(N\)개의 지시-출력 쌍에 대해, Alpaca는 다음과 같은 목적함수를 최소화하며 파인튜닝됩니다. (\(\theta\)는 모델 파라미터)
\[\mathcal{L}( ext) = -\sum_{i=1}^{N} \log p_\theta(y^{(i)} \\| x^{(i)})\]데이터셋 및 벤치마크
Alpaca의 training dataset는 52,000개의 instruction following 시연으로, 이는 OpenAI의 API를 통해 $500 미만의 비용으로 생성되었습니다. 성능 평가는 self-instruct 평가 세트와의 블라인드 페어와이즈 비교를 통해 이루어졌으며, Alpaca는 text-davinci-003과 거의 동등한 성능을 보였습니다.
결론 및 미래의 연구 방향
Alpaca 모델의 성공적인 구현은 트랜스포머 아키텍처의 이해와 양질의 training dataset의 중요성을 강조하며, 향후 연구에서는 더 다양하고 복잡한 instruction following 작업을 포함시켜 모델의 범용성을 향상시킬 필요가 있다고 언급합니다.
Introduction
We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. On our preliminary evaluation of single-turn instruction following, Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<$600). Check out our code release on GitHub.
Update: The public demo is now disabled. The original goal of releasing a demo was to disseminate our research in an accessible way. We feel that we have mostly achieved this goal, and given the hosting costs and the inadequacies of our content filters, we decided to bring down the demo.
Stanford-Alpaca Overview
Instruction-following models such as GPT-3.5 (text-davinci-003), ChatGPT, Claude, and Bing Chat have become increasingly powerful. Many users now interact with these models regularly and even use them for work. However, despite their widespread deployment, instruction-following models still have many deficiencies: they can generate false information, propagate social stereotypes, and produce toxic language.
To make maximum progress on addressing these pressing problems, it is important for the academic community to engage. Unfortunately, doing research on instruction-following models in academia has been difficult, as there is no easily accessible model that comes close in capabilities to closed-source models such as OpenAI’s text-davinci-003.
We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model. We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003. On the self-instruct evaluation set, Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.
We are releasing our training recipe and data, and intend to release the model weights in the future. We are also hosting an interactive demo to enable the research community to better understand the behavior of Alpaca. Interaction can expose unexpected capabilities and failures, which will guide us for the future evaluation of these models. We also encourage users to report any concerning behaviors in our web demo so that we can better understand and mitigate these behaviors. As any release carries risks, we discuss our thought process for this open release later in this blog post.
We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.
Training Recipe
There are two important challenges to training a high-quality instruction-following model under an academic budget: a strong pretrained language model and high-quality instruction-following data. The first challenge is addressed with the recent release of Meta’s new LLaMA models. For the second challenge, the self-instruct paper suggests using an existing strong language model to automatically generate instruction data. In particular, Alpaca is a language model fine-tuned using supervised learning from a LLaMA 7B model on 52K instruction-following demonstrations generated from OpenAI’s text-davinci-003.
The figure below illustrates how we obtained the Alpaca model. For the data, we generated instruction-following demonstrations by building upon the self-instruct method. We started with the 175 human-written instruction-output pairs from the self-instruct seed set. We then prompted text-davinci-003 to generate more instructions using the seed set as in-context examples. We improved over the self-instruct method by simplifying the generation pipeline (see details in GitHub) and significantly reduced the cost. Our data generation process results in 52K unique instructions and the corresponding outputs, which costed less than $500 using the OpenAI API.
Alpaca Pipeline
Equipped with this instruction-following dataset, we then fine-tuned the LLaMA models using Hugging Face’s training framework, taking advantage of techniques like Fully Sharded Data Parallel and mixed precision training. For our initial run, fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers. We note that training efficiency can be improved to further reduce the cost.
Preliminary Evaluation
To evaluate Alpaca, we conduct human evaluation (by the 5 student authors) on the inputs from the self-instruct evaluation set. This evaluation set was collected by the self-instruct authors and covers a diverse list of user-oriented instructions including email writing, social media, and productivity tools. We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance: Alpaca wins 90 versus 89 comparisons against text-davinci-003.
We were quite surprised by this result given the small model size and the modest amount of instruction following data. Besides leveraging this static evaluation set, we have also been testing the Alpaca model interactively and found that Alpaca often behaves similarly to text-davinci-003 on a diverse set of inputs. We acknowledge that our evaluation may be limited in scale and diversity. So we are releasing an interactive demo of Alpaca, and encourage readers to evaluate Alpaca themselves and give us feedback.
In the rest of this section, we include several interaction examples to showcase the capabilities and limitations of Alpaca.