Model | Google - Gemma 2 (Gemma Scope)

Created: 2024-08-06 07:32:42 +0000

Last modified: 2024-09-05 20:56:50 +0900

Gemma Scope: helping the safety community shed light on the inner workings of language models

url: https://deepmind.google/discover/blog/gemma-scope-helping-the-safety-community-shed-light-on-the-inner-workings-of-language-models/

pdf: https://storage.googleapis.com/gemma-scope/gemma-scope-report.pdf

abstract: Sparse autoencoders (SAEs) are an unsupervised method for learning a sparse decomposition of a neural network’s latent representations into seemingly interpretable features. Despite recent excitement about their potential, research applications outside of industry are limited by the high cost of training a comprehensive suite of SAEs. In this work, we introduce Gemma Scope, an open suite of JumpReLU SAEs trained on all layers and sub-layers of Gemma 2 2B and 9B and select layers of Gemma 2 27B base models. We evaluate the quality of each SAE on standard metrics and release these results. We hope that by releasing these SAE weights, we can help push forward safety and interpretability research in the community. Weights, a tutorial and an interactive demo can be found at https://huggingface.co/google/gemma-scope.

Gemma2 - 2B IT 모델이 일부 벤치에서 GPT 3.5 이상의 성능을 보였다고 하고, SOLAR-10.7B-IT 역시 추가학습(자세한 내용은 아직 공식 포스트나 페이퍼는 못 찾음)으로 타 30B 모델보다 벤치마크 점수가 좋아졌다고 합니다. (업스테이지 포스트) 리더보드랑 비교하면서 정성적으로 확인해봐야겠습니다.

최근 메타는 405B 모델을 위주로 학습하였다고 발표했는데, 2B 성능을 70B와 비교해보고 확인해봐야겠습니다.

Inference Code of Gemma2-2b-it

Using Single or Multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))

Using vllm


!pip install vllm==0.5.3
!pip install flashinfer==0.0.8 -i https://flashinfer.ai/whl/cu121/torch2.3/

# Import necessary libraries
import os
import random
import torch
import vllm
from vllm import LLM, SamplingParams


print(f"vLLM version: {vllm.__version__}")  
print(f"PyTorch version: {torch.__version__}")  
print(f"CUDA version: {torch.version.cuda}") 

# Update backend variable for VLLM
os.environ["VLLM_ATTENTION_BACKEND"] = "FLASHINFER"

# Initialize and test vLLM model with sampling parameters https://huggingface.co/google/gemma-2-2b-it
llm = LLM(model="gemma-2-2b-it", trust_remote_code=True)

sampling_params = SamplingParams(
    temperature=0.8,
    max_tokens=512,
    top_p=0.95,
    top_k=1,
)

prompt = "Explain Large Language Model, LLaMA, architecture."


outputs = llm.generate(
    [prompt],
    sampling_params
)

Model | Google - Gemma 2 (Gemma Scope)

Model | Google - Gemma 2 (Gemma Scope)

Model | Google - Gemma 2 (Gemma Scope)

Gemma Scope: helping the safety community shed light on the inner workings of language models

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

Model | Google - Gemma 2 (Gemma Scope)

Model | Google - Gemma 2 (Gemma Scope)

Gemma Scope: helping the safety community shed light on the inner workings of language models

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views