Model | Phi-3 Technical Report

Created: 2024-08-23 02:00:04 +0000

Last modified: 2024-09-05 20:56:50 +0900

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

url: https://arxiv.org/abs/2404.14219

pdf: https://arxiv.org/pdf/2404.14219

html: https://arxiv.org/html/2404.14219v3

abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.

Phi 모델 시리즈는 특정 도메인에서의 능력을 강화하고, 데이터 전략을 수정해 학습 과정을 최적화하는 데 중점을 두었습니다. Phi 시리즈는 각 모델의 독창적인 데이터 전략과 파라미터 효율성을 통해 크고 작은 다양한 작업에서 좋은 성능을 보여주며, 모델의 확장성과 적용 범위를 넓히며, 실용적인 배포 가능성을 제공했습니다.

Phi-1: 이 모델은 1.3B 파라미터를 가지며 주로 코딩과 자연어 처리 작업에 초점을 맞추었습니다. ‘텍스트북 품질’의 웹 데이터와 GPT-3.5로 생성된 합성 교재를 활용하는 독창적인 학습 방법을 채택하여 편향과 오류를 최소화했습니다.

Phi-1.5: 자연어 이해와 인퍼런스 작업에서 5배 더 큰 모델과 견줄 수 있는 성능을 보여주는 1.3B 파라미터 모델로, 특히 학교 수준의 수학과 기초 코딩 문제에서 향상된 결과를 보였습니다. 이 모델 역시 웹 데이터를 배제하고 교과서 데이터를 중심으로 학습함으로써 오류 발생 가능성을 줄였습니다.

Phi-2: 2.7B 파라미터의 모델로, 교육적 가치를 갖는 웹 데이터와 특별히 생성된 합성 데이터를 사용하여 복잡한 벤치마크에서 더 큰 모델과 견줄 수 있는 성능을 보여줍니다. Azure AI Studio를 통해 연구에 이용됩니다.

Phi-3-mini: 3.8B 파라미터를 가진 이 모델은 3.3조 개의 토큰으로 훈련되었으며, 휴대폰에 배포될 수 있을 만큼의 작은 크기임에도 불구하고 Mixtral 8x7B 및 GPT-3.5와 같은 모델들과 경쟁할 수 있는 성능을 보여주며 training dataset는 공개 웹 데이터와 합성 데이터를 중심으로 철저한 필터링을 거쳐 구성되어, 모델의 안정성과 안전성을 더욱 강화했습니다.

Phi-3-small 및 Phi-3-medium: 이 두 모델은 각각 7B와 14B 파라미터를 가지며, 4.8조 토큰으로 훈련된 결과 phi-3-mini보다 훨씬 높은 성능을 보입니다. 이런 확장된 버전은 training dataset의 규모를 늘리고, 더 복잡한 작업에서의 효과를 극대화했습니다.

Phi-3-vision: 4.2B 파라미터를 가진 이 모델은 이미지와 텍스트 프롬프트에 대한 인퍼런스 능력이 강화된 버전으로, phi-3-mini의 기반 구조를 활용하여 다양한 멀티모달 작업을 수행할 수 있습니다.

데이터 퀄리티 및 도메인 스페서픽 SLM 연구 관련 Phi 색인마킹

Phi-1

Release Date: 2023.06

Phi-1 is a compact 1.3 billion parameter Transformer model tailored for coding tasks.
It was trained on a unique blend of "textbook quality" web data and synthetic exercises generated with GPT-3.5.
Despite its smaller scale, phi-1 achieves competitive coding accuracies and exhibits emergent properties.

Learn More >

Phi-1.5

Release Date: 2023.09

Phi-1.5 is a 1.3 billion parameter model optimized for complex reasoning tasks.
It uses textbook-based data to minimize bias and enhance performance.
The model is open-sourced to encourage further research.

Learn More >

Phi-2

Release Date: 2023.12

Microsoft's Phi-2 model, with 2.7 billion parameters, performs on par with much larger models in complex benchmarks.
It utilizes high-quality, strategically curated training data to excel in reasoning and understanding tasks.
Phi-2 is available on Azure AI Studio for research, promoting advancements in AI safety and interpretability.

Learn More >

Phi 3

Release Date: 2024.04

Phi-3-mini is a mobile-friendly 3.8 billion parameter language model.
It uses a unique mix of filtered web and synthetic data for training.
The model extends to larger versions and an image-text reasoning variant.

Learn More >

Model | Phi-3 Technical Report

Model | Phi-3 Technical Report

Model | Phi-3 Technical Report

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

Model | Phi-3 Technical Report

Model | Phi-3 Technical Report

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views