RAG | Context Embeddings RAG

Created: 2024-08-06 07:32:42 +0000

Last modified: 2024-09-05 20:56:50 +0900

Context Embeddings for Efficient Answer Generation in RAG

url: https://arxiv.org/abs/2407.09252

pdf: https://arxiv.org/pdf/2407.09252

html https://arxiv.org/html/2407.09252v1

abstract: Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer which slows down decoding time directly translating to the time a user has to wait for an answer. We address this challenge by presenting COCOM, an effective context compression method, reducing long contexts to only a handful of Context Embeddings speeding up the generation time by a large margin. Our method allows for different compression rates trading off decoding time for answer quality. Compared to earlier methods, COCOM allows for handling multiple contexts more effectively, significantly reducing decoding time for long inputs. Our method demonstrates a speed-up of up to 5.69 × while achieving hirfgher performance compared to existing efficient context compression methods.

RAG | Context Embeddings RAG

RAG | Context Embeddings RAG

RAG | Context Embeddings RAG

Context Embeddings for Efficient Answer Generation in RAG

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views

Share Your Feedback 🏝️

RAG | Context Embeddings RAG

RAG | Context Embeddings RAG

Context Embeddings for Efficient Answer Generation in RAG

post contain ""

No matching posts found containing ""

Recent Posts

Most Likes

Most Views