Model | Phi-2 · MinWoo Park

Created: 2024-08-23 02:00:04 +0000

Last modified: 2024-09-05 20:56:50 +0900

Phi-2: The surprising power of small language models

url: https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

model-catalog: https://ai.azure.com/explore/models/microsoft-phi-2/version/4/registry/azureml-msr

abstract: Over the past few months, our Machine Learning Foundations team at Microsoft Research has released a suite of small language models (SLMs) called “Phi” that achieve remarkable performance on a variety of benchmarks. Our first model, the 1.3 billion parameter Phi-1(opens in new tab), achieved state-of-the-art performance on Python coding among existing SLMs (specifically on the HumanEval and MBPP benchmarks). We then extended our focus to common sense reasoning and language understanding and created a new 1.3 billion parameter model named Phi-1.5(opens in new tab), with performance comparable to models 5x larger.

Phi 모델 시리즈는 특정 도메인에서의 능력을 강화하고, 데이터 전략을 수정해 학습 과정을 최적화하는 데 중점을 두었습니다. Phi 시리즈는 각 모델의 독창적인 데이터 전략과 파라미터 효율성을 통해 크고 작은 다양한 작업에서 좋은 성능을 보여주며, 모델의 확장성과 적용 범위를 넓히며, 실용적인 배포 가능성을 제공했습니다.

Phi-1: 이 모델은 1.3B 파라미터를 가지며 주로 코딩과 자연어 처리 작업에 초점을 맞추었습니다. ‘텍스트북 품질’의 웹 데이터와 GPT-3.5로 생성된 합성 교재를 활용하는 독창적인 학습 방법을 채택하여 편향과 오류를 최소화했습니다.

Phi-1.5: 자연어 이해와 인퍼런스 작업에서 5배 더 큰 모델과 견줄 수 있는 성능을 보여주는 1.3B 파라미터 모델로, 특히 학교 수준의 수학과 기초 코딩 문제에서 향상된 결과를 보였습니다. 이 모델 역시 웹 데이터를 배제하고 교과서 데이터를 중심으로 학습함으로써 오류 발생 가능성을 줄였습니다.

Phi-2: 2.7B 파라미터의 모델로, 교육적 가치를 갖는 웹 데이터와 특별히 생성된 합성 데이터를 사용하여 복잡한 벤치마크에서 더 큰 모델과 견줄 수 있는 성능을 보여줍니다. Azure AI Studio를 통해 연구에 이용됩니다.

Phi-3-mini: 3.8B 파라미터를 가진 이 모델은 3.3조 개의 토큰으로 훈련되었으며, 휴대폰에 배포될 수 있을 만큼의 작은 크기임에도 불구하고 Mixtral 8x7B 및 GPT-3.5와 같은 모델들과 경쟁할 수 있는 성능을 보여줍니다. 특히, training dataset는 공개 웹 데이터와 합성 데이터를 중심으로 철저한 필터링을 거쳐 구성되어, 모델의 안정성과 안전성을 더욱 강화했습니다.

Phi-3-small 및 Phi-3-medium: 이 두 모델은 각각 7B와 14B 파라미터를 가지며, 4.8조 토큰으로 훈련된 결과 phi-3-mini보다 훨씬 높은 성능을 보입니다. 이런 확장된 버전은 training dataset의 규모를 늘리고, 더 복잡한 작업에서의 효과를 극대화했습니다.

Phi-3-vision: 4.2B 파라미터를 가진 이 모델은 이미지와 텍스트 프롬프트에 대한 인퍼런스 능력이 강화된 버전으로, phi-3-mini의 기반 구조를 활용하여 다양한 멀티모달 작업을 수행할 수 있습니다.

데이터 퀄리티 및 도메인 스페서픽 SLM 연구 관련 Phi 색인마킹

Phi-1

Release Date: 2023.06

Phi-1 is a compact 1.3 billion parameter Transformer model tailored for coding tasks.
It was trained on a unique blend of "textbook quality" web data and synthetic exercises generated with GPT-3.5.
Despite its smaller scale, phi-1 achieves competitive coding accuracies and exhibits emergent properties.

Learn More >

Phi-1.5

Release Date: 2023.09

Phi-1.5 is a 1.3 billion parameter model optimized for complex reasoning tasks.
It uses textbook-based data to minimize bias and enhance performance.
The model is open-sourced to encourage further research.

Learn More >

Phi-2

Release Date: 2023.12

Microsoft's Phi-2 model, with 2.7 billion parameters, performs on par with much larger models in complex benchmarks.
It utilizes high-quality, strategically curated training data to excel in reasoning and understanding tasks.
Phi-2 is available on Azure AI Studio for research, promoting advancements in AI safety and interpretability.

Learn More >

Phi 3

Release Date: 2024.04

Phi-3-mini is a mobile-friendly 3.8 billion parameter language model.
It uses a unique mix of filtered web and synthetic data for training.
The model extends to larger versions and an image-text reasoning variant.

Learn More >

Model | Phi-2

Model | Phi-2

Model | Phi-2

Phi-2: The surprising power of small language models

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

Model | Phi-2

Model | Phi-2

Phi-2: The surprising power of small language models

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views