SSM | Transformers are SSMs**

Created: 2024-08-23 02:00:04 +0000

Last modified: 2024-09-05 20:56:50 +0900

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

url: https://arxiv.org/abs/2405.21060

pdf: https://arxiv.org/pdf/2405.21060

html https://arxiv.org/html/2405.21060v1

abstract: While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba’s selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

[SSM 핵심 색인마킹]

SSM | Transformers are SSMs**

SSM | Transformers are SSMs**

SSM | Transformers are SSMs**

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

SSM | Transformers are SSMs**

SSM | Transformers are SSMs**

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views