Skip to content

Retrieval Augmented Generation (RAG) for LLMs

Overview

There are many challenges when working with LLMs such as domain knowledge gaps, factuality issues, and hallucination. Retrieval Augmented Generation (RAG) provides a solution to mitigate some of these issues by augmenting LLMs with external knowledge such as databases. RAG is particularly useful in knowledge-intensive scenarios or domain-specific applications that require knowledge that's continually updating.

Key Advantages

A key advantage of RAG over other approaches is that the LLM doesn't need to be retrained for task-specific applications. RAG has been popularized recently with its application in conversational agents.

Research Summary

In this summary, we highlight the main findings and practical insights from the recent survey titled Retrieval-Augmented Generation for Large Language Models: A Survey (Gao et al., 2023). In particular, we focus on the existing approaches, state-of-the-art RAG, evaluation, applications and technologies surrounding the different components that make up a RAG system (retrieval, generation, and augmentation techniques).

Introduction to RAG

RAG Framework

"RAG Framework"

As better introduced here, RAG can be defined as:

RAG takes input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time. This is very useful as LLMs's parametric knowledge is static. RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.

Key Benefits

In short, the retrieved evidence obtained in RAG can serve as a way to enhance the accuracy, controllability, and relevancy of the LLM's response. This is why RAG can help reduce issues of hallucination or performance when addressing problems in a highly evolving environment.

Evolution of RAG

While RAG has also involved the optimization of pre-training methods, current approaches have largely shifted to combining the strengths of RAG and powerful fine-tuned models like ChatGPT and Mixtral. The chart below shows the evolution of RAG-related research:

"RAG Framework"

Figure Source

RAG Application Workflow

Below is a typical RAG application workflow:

"RAG Framework"

Figure Source

Workflow Components

We can explain the different steps/components as follows:

1. Input

The question to which the LLM system responds is referred to as the input. If no RAG is used, the LLM is directly used to respond to the question.

2. Indexing

If RAG is used, then a series of related documents are indexed by:

  • Chunking them first
  • Generating embeddings of the chunks
  • Indexing them into a vector store

At inference, the query is also embedded in a similar way.

3. Retrieval

The relevant documents are obtained by comparing the query against the indexed vectors, also denoted as "Relevant Documents".

4. Generation

The relevant documents are combined with the original prompt as additional context. The combined text and prompt are then passed to the model for response generation which is then prepared as the final output of the system to the user.

Example Use Case

In the example provided, using the model directly fails to respond to the question due to a lack of knowledge of current events. On the other hand, when using RAG, the system can pull the relevant information needed for the model to answer the question appropriately.

RAG Paradigms

Over the past few years, RAG systems have evolved from Naive RAG to Advanced RAG and Modular RAG. This evolution has occurred to address certain limitations around performance, cost, and efficiency.

"RAG Framework"

Figure Source

1. Naive RAG

Naive RAG follows the traditional aforementioned process of indexing, retrieval, and generation. In short, a user input is used to query relevant documents which are then combined with a prompt and passed to the model to generate a final response. Conversational history can be integrated into the prompt if the application involves multi-turn dialogue interactions.

Limitations

Naive RAG has limitations such as:

  • Low Precision: Misaligned retrieved chunks
  • Low Recall: Failure to retrieve all relevant chunks
  • Outdated Information: LLM may receive outdated information
  • Hallucination Issues: Poor and inaccurate responses
  • Redundancy: Issues with repetition when using multiple retrieved passages
  • Style/Tone Challenges: Difficulty in ranking and reconciling style/tone
  • Over-dependence: Generation task may overly depend on augmented information

2. Advanced RAG

Advanced RAG helps deal with issues present in Naive RAG such as improving retrieval quality that could involve optimizing the pre-retrieval, retrieval, and post-retrieval processes.

Pre-retrieval Process

The pre-retrieval process involves optimizing data indexing which aims to enhance the quality of the data being indexed through five stages:

  • Enhancing Data Granularity: Improving data structure
  • Optimizing Index Structures: Better indexing methods
  • Adding Metadata: Rich contextual information
  • Alignment Optimization: Better data alignment
  • Mixed Retrieval: Combining different retrieval approaches

Retrieval Stage

The retrieval stage can be further improved by optimizing the embedding model itself which directly impacts the quality of the chunks that make up the context. This can be done by:

  • Fine-tuning Embeddings: Optimizing retrieval relevance
  • Dynamic Embeddings: Better capturing contextual understanding (e.g., OpenAI's embeddings-ada-02 model)

Post-retrieval Process

Optimizing post-retrieval focuses on avoiding context window limits and dealing with noisy or potentially distracting information. Common approaches include:

  • Re-ranking: Relocating relevant context to prompt edges or recalculating semantic similarity
  • Prompt Compression: Reducing context length while maintaining relevance

3. Modular RAG

As the name implies, Modular RAG enhances functional modules such as incorporating a search module for similarity retrieval and applying fine-tuning in the retriever. Both Naive RAG and Advanced RAG are special cases of Modular RAG and are made up of fixed modules.

Extended RAG Modules

Extended RAG modules include:

  • Search: Information retrieval capabilities
  • Memory: Persistent knowledge storage
  • Fusion: Combining multiple information sources
  • Routing: Directing queries to appropriate modules
  • Predict: Forecasting information needs
  • Task Adapter: Adapting to specific task requirements

These modules can be rearranged to suit specific problem contexts. Therefore, Modular RAG benefits from greater diversity and flexibility in that you can add or replace modules or adjust the flow between modules based on task requirements.

Optimization Techniques

Given the increased flexibility in building RAG systems, other important optimization techniques have been proposed to optimize RAG pipelines including:

Hybrid Search Exploration

This approach leverages a combination of search techniques like keyword-based search and semantic search to retrieve relevant and context-rich information; this is useful when dealing with different query types and information needs.

Recursive Retrieval and Query Engine

Involves a recursive retrieval process that might start with small semantic chunks and subsequently retrieve larger chunks that enrich the context; this is useful to balance efficiency and context-rich information.

StepBack-prompt

A prompting technique that enables LLMs to perform abstraction that produces concepts and principles that guide reasoning; this leads to better-grounded responses when adopted to a RAG framework because the LLM moves away from specific instances and is allowed to reason more broadly if needed.

Sub-Queries

There are different query strategies such as tree queries or sequential querying of chunks that can be used for different scenarios. LlamaIndex offers a sub question query engine that allows a query to be broken down into several questions that use different relevant data sources.

Hypothetical Document Embeddings (HyDE)

HyDE generates a hypothetical answer to a query, embeds it, and uses it to retrieve documents similar to the hypothetical answer as opposed to using the query directly.

RAG Framework Components

In this section, we summarize the key developments of the components of a RAG system, which include Retrieval, Generation, and Augmentation.

Retrieval

Retrieval is the component of RAG that deals with retrieving highly relevant context from a retriever. A retriever can be enhanced in many ways, including:

Enhancing Semantic Representations

This process involves directly improving the semantic representations that power the retriever. Here are a few considerations:

Chunking

One important step is choosing the right chunking strategy which depends on:

  • The content you are dealing with
  • The application you are generating responses for
  • Different models display different strengths on varying block sizes
  • Sentence transformers perform better on single sentences
  • Text-embedding-ada-002 performs better with blocks containing 256 or 512 tokens
  • Consider user question length, application, and token limits
  • Common to experiment with different chunking strategies to optimize retrieval
Fine-tuned Embedding Models

Once you have determined an effective chunking strategy, it may be required to fine-tune the embedding model if you are working with a specialized domain. Otherwise, it's possible that the user queries will be completely misunderstood in your application. You can fine-tune on:

  • Broad Domain Knowledge: Domain knowledge fine-tuning
  • Specific Downstream Tasks: Task-specific optimization

BGE-large-EN developed by BAAI is a notable embedding model that can be fine-tuned to optimize retrieval relevance.

Aligning Queries and Documents

This process deals with aligning user's queries to those of documents in the semantic space. This may be needed when a user's query may lack semantic information or contain imprecise phrasing. Here are some approaches:

Query Rewriting

Focuses on rewriting queries using a variety of techniques such as:

  • Query2Doc: Document-based query rewriting
  • ITER-RETGEN: Iterative retrieval and generation
  • HyDE: Hypothetical document embeddings
Embedding Transformation

Optimizes the representation of query embeddings and align them to a latent space that is more closely aligned with a task.

Aligning Retriever and LLM

This process deals with aligning the retriever outputs with the preferences of the LLMs.

Fine-tuning Retrievers

Uses an LLM's feedback signals to refine the retrieval models. Examples include:

  • Augmentation Adapted Retriever (AAR)
  • REPLUG
  • UPRISE
Adapters

Incorporates external adapters to help with the alignment process. Examples include:

  • PRCA
  • RECOMP
  • PKG

Generation

The generator in a RAG system is responsible for converting retrieved information into a coherent text that will form the final output of the model. This process involves diverse input data which sometimes require efforts to refine the adaptation of the language model to the input data derived from queries and documents. This can be addressed using post-retrieval process and fine-tuning:

Post-retrieval with Frozen LLM

Post-retrieval processing leaves the LLM untouched and instead focuses on enhancing the quality of retrieval results through operations like:

  • Information Compression: Reducing noise and addressing context length restrictions
  • Result Reranking: Reordering documents to prioritize most relevant items

Fine-tuning LLM for RAG

To improve the RAG system, the generator can be further optimized or fine-tuned to ensure that the generated text is natural and effectively leverages the retrieved documents.

Augmentation

Augmentation involves the process of effectively integrating context from retrieved passages with the current generation task. Before discussing more on the augmentation process, augmentation stages, and augmentation data, here is a taxonomy of RAG's core components:

"RAG Taxonomy"

Figure Source

Retrieval augmentation can be applied in many different stages such as pre-training, fine-tuning, and inference.

Evaluation

Evaluating a RAG framework focuses on three primary quality scores and four abilities.

Quality Scores

  • Context Relevance: Measuring the precision and specificity of retrieved context
  • Answer Faithfulness: Measuring the faithfulness of answers to the retrieved context
  • Answer Relevance: Measuring the relevance of answers to posed questions

Four Abilities

  • Noise Robustness: Handling noisy or irrelevant information
  • Negative Rejection: Rejecting incorrect or irrelevant information
  • Information Integration: Effectively combining multiple information sources
  • Counterfactual Robustness: Handling contradictory or false information

Evaluation Tools

Several benchmarks like RGB and RECALL are used to evaluate RAG models. Many tools have been developed to automate the process of evaluating RAG systems:

  • RAGAS: Automated evaluation framework
  • ARES: Evaluation and ranking system
  • TruLens: Trust and evaluation framework

Some of these systems rely on LLMs to determine some of the quality scores defined above.

Challenges & Future of RAG

In this overview, we discussed several research aspects of RAG research and different approaches for enhancing retrieval, augmentation, and generation of a RAG system. Here are several challenges emphasized by Gao et al., 2023 as we continue developing and improving RAG systems:

Key Challenges

  1. Context Length: LLMs continue to extend context window size which presents challenges to how RAG needs to be adapted to ensure highly relevant and important context is captured.

  2. Robustness: Dealing with counterfactual and adversarial information is important to measure and improve in RAG.

  3. Hybrid Approaches: There is an ongoing research effort to better understand how to best optimize the use of both RAG and fine-tuned models.

  4. Expanding LLM Roles: Increasing the role and capabilities of LLMs to further enhance RAG systems is of high interest.

  5. Scaling Laws: Investigation of LLM scaling laws and how they apply to RAG systems are still not properly understood.

  6. Production-ready RAG: Production-grade RAG systems demand engineering excellence across performance, efficiency, data security, privacy, and more.

  7. Multimodal RAG: While there have been lots of research efforts around RAG systems, they have been mostly centered around text-based tasks. There is increasing interest in extending modalities for a RAG system to support tackling problems in more domains such as image, audio and video, code, and more.

  8. Evaluation: The interest in building complex applications with RAG requires special attention to develop nuanced metrics and assessment tools that can more reliably assess different aspects such as contextual relevance, creativity, content diversity, factuality, and more. In addition, there is also a need for better interpretability research and tools for RAG.

RAG Tools

Comprehensive Tools

Some popular comprehensive tools to build RAG systems include:

  • LangChain: Comprehensive framework for building LLM applications
  • LlamaIndex: Data framework for LLM applications
  • DSPy: Framework for optimizing LLM prompts and weights

Specialized Tools

There are also a range of specialized tools that serve different purposes:

  • Flowise AI: Low-code solution for building RAG applications
  • HayStack: Open-source framework for building production-ready NLP applications
  • Meltano: DataOps platform
  • Cohere Coral: RAG-focused platform

Cloud Services

Software and cloud service providers are also including RAG-centric services:

  • Verba from Weaviate: Useful for building personal assistant applications
  • Amazon's Kendra: Intelligent enterprise search services

Conclusion

In conclusion, RAG systems have evolved rapidly including the development of more advanced paradigms that enable customization and further the performance and utility of RAG across a wide range of domains. There is a huge demand for RAG applications, which has accelerated the development of methods to improve the different components of a RAG system.

From hybrid methodologies to self-retrieval, these are some of the currently explored research areas of modern RAG models. There is also increasing demand for better evaluation tools and metrics.

The figure below provides a recap of the RAG ecosystem, techniques to enhance RAG, challenges, and other related aspects covered in this overview:

"RAG Ecosystem"

Figure Source

RAG Research Insights

Below is a collection of research papers highlighting key insights and the latest developments in RAG.

InsightReferenceDate
Shows how retrieval augmentation can be used to distill language model assistants by training retrieval augmented simulatorsKAUCUS: Knowledge Augmented User Simulators for Training Language Model AssistantsMar 2024
Proposes Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation in a RAG system. The core idea is to implement a self-correct component for the retriever and improve the utilization of retrieved documents for augmenting generation. The retrieval evaluator helps to assess the overall quality of retrieved documents given a query. Using web search and optimized knowledge utilization operations can improve automatic self-correction and efficient utilization of retrieved documents.Corrective Retrieval Augmented GenerationJan 2024
Recursively embeds, clusters, and summarizes chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, the proposed RAPTOR model retrieves from the tree, integrating information across lengthy documents at different levels of abstraction.RAPTOR: Recursive Abstractive Processing for Tree-Organized RetrievalJan 2024
A general program with multi-step interactions between LMs and retrievers to efficiently tackle multi-label classification problems.In-Context Learning for Extreme Multi-Label ClassificationJan 2024
Extracts semantically similar prompts from high-resource languages to improve the zero-shot performance of multilingual pre-trained language models across diverse tasks.From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICLNov 2023
Improves the robustness of RAGs in facing noisy, irrelevant documents and in handling unknown scenarios. It generates sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating the information to prepare the final answer.Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language ModelsNov 2023
Eliminates tokens that might not contribute essential information to optimize the answer generation process of a reader. Reduces run-time by up to 62.2%, with only a 2% reduction in performance.Optimizing Retrieval-augmented Reader Models via Token EliminationOct 2023
Instruction-tunes a small LM verifier to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier. It helps to address scenarios where the model may fail to retrieve the knowledge relevant to the given query, or where the model may not faithfully reflect the retrieved knowledge in the generated text.Knowledge-Augmented Language Model VerificationOct 2023
Benchmark to analyze the performance of different LLMs in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness.Benchmarking Large Language Models in Retrieval-Augmented GenerationOct 2023
Introduces the Self-Reflective Retrieval-Augmented Generation (Self-RAG) framework that enhances an LM's quality and factuality through retrieval and self-reflection. It leverages an LM to adaptively retrieve passages, and generates and reflects on retrieved passages and its own generations using reflection tokens.Self-RAG: Learning to Retrieve, Generate, and Critique through Self-ReflectionOct 2023
Improves zero-shot information retrieval by iteratively improving retrieval through generation-augmented retrieval (GAR) and improving rewrite through RAG. The rewrite-retrieval stages improves recall and a re-ranking stage improves precision.GAR-meets-RAG Paradigm for Zero-Shot Information RetrievalOct 2023
Pretrains a 48B retrieval model using a base 43B GPT model and retrieving from 1.2 trillion tokens. The model is further instruction tuned to demonstrate significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks.InstructRetro: Instruction Tuning post Retrieval-Augmented PretrainingOct 2023
Retrofits an LLM with retrieval capabilities through two distinct fine-tuning steps: one updates a pre-trained LM to better use retrieved information, and the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, each stage yields performance improvements.RA-DIT: Retrieval-Augmented Dual Instruction TuningOct 2023
A method to make RAGs robust to irrelevant content. It automatically generates data to fine-tune a language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time.Making Retrieval-Augmented Language Models Robust to Irrelevant ContextOct 2023
Finds that LLMs with 4K context window using simple retrieval-augmentation at generation achieve comparable performance to finetuned LLMs with 16K context window via positional interpolation on long context tasks.Retrieval meets Long Context Large Language ModelsOct 2023
Compresses retrieved documents into textual summaries prior to in-context integration which reduces the computational costs and relieves the burden of LMs to identify relevant information in long retrieved documents.RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective AugmentationOct 2023
An iterative retrieval-generation collaborative framework that leverages both parametric and non-parametric knowledge and helps to find the correct reasoning path through retrieval-generation interactions. Useful for tasks that require multi-step reasoning and overall improves reasoning ability of LLMs.Retrieval-Generation Synergy Augmented Large Language ModelsOct 2023
Proposes Tree of Clarifications (ToC), a framework that recursively constructs a tree of disambiguations for ambiguous questions via few-shot prompting leveraging external knowledge. Then, it uses the tree to generate a long-form answer.Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language ModelsOct 2023
An approach that lets an LLM refer to the questions it has previously encountered and adaptively call for external resources when encountering new questions.Self-Knowledge Guided Retrieval Augmentation for Large Language ModelsOct 2023
A suite of metrics which can be used to evaluate different dimensions (i.e., the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself) without having to rely on ground truth human annotations.RAGAS: Automated Evaluation of Retrieval Augmented GenerationSep 2023
Proposes a generate-then-read (GenRead) method, which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.Generate rather than Retrieve: Large Language Models are Strong Context GeneratorsSep 2023
Demonstrates how rankers such as DiversityRanker and LostInTheMiddleRanker can be utilized in a RAG system to select and utilize information that optimizes LLM context window utilization.Enhancing RAG Pipelines in Haystack: Introducing DiversityRanker and LostInTheMiddleRankerAug 2023
Bridges LLMs with various knowledge bases (KBs), facilitating both the retrieval and storage of knowledge. The retrieval process employs program of thought prompting, which generates search language for KBs in code format with pre-defined functions for KB operations. It also offers the capability to store knowledge in a personalized KB, catering to individual user demands.KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge BasesAug 2023
Proposes a model that combines retrieval-augmented masked language modeling and prefix language modeling. Then, it introduces Fusion-in-Context Learning to enhance few-shot performance by enabling the model to leverage more in-context examples without requiring additional training.RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language ModelsAug 2023
RaLLe is an open-source framework to develop, evaluate, and optimize RAG systems for knowledge-intensive tasks.RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language ModelsAug 2023
Finds that the performance of an LLM can degrade significantly when changing the position of relevant information, which indicates that LLMs do not robustly make use of information in long input contexts.Lost in the Middle: How Language Models Use Long ContextsJul 2023
Synergizes retrieval and generation in an iterative manner. The model output is used to show what is needed to finish a task, providing informative context for retrieving more relevant knowledge which in turn helps generate a better output in the next iteration.Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation SynergyMay 2023
Provides a generalized view of active RAG, methods that actively decide when and what to retrieve across the course of the generation. Then, proposes Forward-Looking Active REtrieval augmented generation (FLARE), a method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens.Active Retrieval Augmented GenerationMay 2023
Introduces a generic retrieval plug-in that utilizes a generic retriever to enhance target LMs that may be unknown in advance or are unable to be fine-tuned jointly.Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-InMay 2023
Improves dense retrieval on structured data through two pre-training strategies. First, it utilizes the natural alignment between structured and unstructured data for structure-aware pretraining. Then, it implements Masked Entity Prediction for masked entity prediction and capturing structural semantics.Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured DataMay 2023
Dynamically incorporates grounding information from heterogeneous sources in multiple domains to enhance factual correctness of LLMs. Introduces an adaptive query generator to deal with queries tailored to different knowledge sources. The framework corrects rationales progressively to make sure that inaccuracies from preceding rationales do not propagate into the subsequent steps.Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous SourcesMay 2023
A framework to generate context-relevant and knowledge-grounded dialogues with a knowledge graph (KG). It first retrieves the relevant subgraph from the KG, and then enforces consistency across facts by perturbing their word embeddings conditioned by the retrieved subgraph. Then, it utilizes contrastive learning to ensure that the generated texts have high similarity to the retrieved subgraphs.Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue GenerationMay 2023
Adopts a small language model as a trainable rewriter to cater to a black-box LLM reader. The rewriter is trained using the feedback of the LLM reader by RL. Results in a new framework called Rewrite-Retrieve-Read where the focus is on optimizing queries.Query Rewriting for Retrieval-Augmented Large Language ModelsMay 2023
Iteratively employs a retrieval-augmented generator to create an unbounded memory pool and uses a memory selector to choose one output as memory for the subsequent generation round. This enables a model to leverage its own output, referred to as self-memory, for improved generation.Lift Yourself Up: Retrieval-augmented Text Generation with Self MemoryMay 2023
Equips LLMs with a knowledge-guiding module to access relevant knowledge without altering its parameters. It improves performance of "black-box" LLMs on a range of domain knowledge-intensive tasks that require factual (+7.9%), tabular (+11.9%), medical (+3.0%), and multimodal (+8.1%) knowledge.Augmented Large Language Models with Parametric Knowledge GuidingMay 2023
Equips LLMs with a general write-read memory unit, allowing them to extract, store, and recall knowledge from the text as needed for task performance.RET-LLM: Towards a General Read-Write Memory for Large Language ModelsMay 2023
Adopts a task-agnostic retriever to build a shared static index and select candidate evidence efficiently. Then, designs a prompt-guided reranker to rerank the nearest evidence according to task-specific relevance for the reader.Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive TasksMay 2023
Proposes UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input.UPRISE: Universal Prompt Retrieval for Improving Zero-Shot EvaluationMar 2023
An adaptive filter-then-rerank paradigm that combines the strengths of SLMs (serve as filters) and LLMs (serve as rerankers).Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!Mar 2023
Zero-shot instructs an instruction-following LLM to generate a hypothetical document that captures relevance patterns. Then, a Contriever encodes the document into an embedding vector which is used to identify a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity.Precise Zero-Shot Dense Retrieval without Relevance LabelsDec 2022
Proposes Demonstrate-Search-Predict (DSP), a framework to compose high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that can be handled more reliably.Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPDec 2022
An approach for multi-step QA that interleaves retrieval with steps in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. This helps to improve performance on knowledge-intensive multi-step questions.Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step QuestionsDec 2022
Shows that retrieval-augmentation can reduce the dependence on relevant pre-training information, which makes RAG a promising approach for capturing the long-tail.Large Language Models Struggle to Learn Long-Tail KnowledgeNov 2022
Recites one or several relevant passages from LLMs' own memory via sampling, and then produces the final answers.Recitation-Augmented Language ModelsOct 2022
Leverages LLMs as a few-shot query generator, and creates task-specific retrievers based on the generated data.Promptagator: Few-shot Dense Retrieval From 8 ExamplesSep 2022
Presents Atlas, a pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples.Atlas: Few-shot Learning with Retrieval Augmented Language ModelsAug 2022
Retrieves from the training data to achieve gains on multiple NLG and NLU tasks.Training Data is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training DataMar 2022
Approximates a datastore search by saving pointers between consecutive datastore entries, and clustering those entries into states. Results in a weighted finite automaton that, at inference time, helps save up to 83% of the nearest neighbor searchers over kNN-LM without hurting perplexity.Neuro-Symbolic Language Modeling with Automaton-augmented RetrievalJan 2022
Improves an auto-regressive language model by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. It enhances the model by retrieving from a 2 trillion token database.Improving language models by retrieving from trillions of tokensDec 2021
A novel approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models.Robust Retrieval Augmented Generation for Zero-shot Slot FillingAug 2021
Introduces RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. It compares two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, and the other uses different passages per token.Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksMay 2020
Shows that retrieval can be implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.Dense Passage Retrieval for Open-Domain Question AnsweringApr 2020

References