00:00:00

Share Your Feedback 🏝️

Hallucination Mitigation

Hallucination Mitigation

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: WikiChat Next: False Promise

Hallucination Mitigation

  • Related Project: Private
  • Category: Paper Review
  • Date: 2024-01-11

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

  • url: https://arxiv.org/abs/2401.01313
  • pdf: https://arxiv.org/pdf/2401.01313
  • abstract: As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deploying these powerful LLMs into real-world production systems that impact people’s lives. The journey toward widespread adoption of LLMs in practical settings heavily relies on addressing and mitigating hallucinations. Unlike traditional AI systems focused on limited tasks, LLMs have been exposed to vast amounts of online text data during training. While this allows them to display impressive language fluency, it also means they are capable of extrapolating information from the biases in training data, misinterpreting ambiguous prompts, or modifying the information to align superficially with the input. This becomes hugely alarming when we rely on language generation capabilities for sensitive applications, such as summarizing medical records, financial analysis reports, etc. This paper presents a comprehensive survey of over 32 techniques developed to mitigate hallucination in LLMs. Notable among these are Retrieval Augmented Generation (Lewis et al, 2021), Knowledge Retrieval (Varshney et al,2023), CoNLI (Lei et al, 2023), and CoVe (Dhuliawala et al, 2023). Furthermore, we introduce a detailed taxonomy categorizing these methods based on various parameters, such as dataset utilization, common tasks, feedback mechanisms, and retriever types. This classification helps distinguish the diverse approaches specifically designed to tackle hallucination issues in LLMs. Additionally, we analyze the challenges and limitations inherent in these techniques, providing a solid foundation for future research in addressing hallucinations and related phenomena within the realm of LLMs.

Contents


1 Introduction

Hallucination in Large Language Models (LLMs) entails the creation of factually erroneous information spanning a multitude of subjects. Given the extensive domain coverage of LLMs, their application extends across numerous scholarly and professional areas. These include, but are not limited to, academic research, programming, creative writing, technical advisement, and the facilitation of skill acquisition. Consequently, LLMs have emerged as an indispensable component in our daily lives, playing a crucial role in dispensing accurate and reliable information. Nevertheless, a fundamental issue with LLMs is their propensity to yield erroneous or fabricated details about real-world subjects. This tendency to furnish incorrect data, commonly referred to as hallucination, poses a significant challenge for researchers in the field. It leads to scenarios where advanced models like GPT-4 and others of its ilk may generate references that are inaccurate or completely unfounded (Rawte et al., 2023). This issue arises due to the training phase’s pattern generation techniques and the absence of real-time internet updates, contributing to discrepancies in the information output (Ray, 2023).

In contemporary computational linguistics, mitigating hallucination is a critical focus. Researchers have proposed various strategies, encompassing feedback mechanisms, external information retrieval, and early refinement in language model generation, to address this challenge. This paper assumes significance by consolidating and organizing these diverse techniques into a comprehensive taxonomy. In essence, the contributions of this paper to the realm of LLM hallucination are threefold:

  1. Introduction of a systematic taxonomy designed to categorize hallucination mitigation techniques for LLMs, encompassing Vision Language Models (VLMs).
  2. Synthesis of the essential features characterizing these mitigation techniques, thereby guiding more structured future research endeavors within this domain.
  3. Deliberation on the limitations and challenges inherent in these techniques, accompanied by potential solutions and proposed directions for future research.

Figure 1: Taxonomy of hallucination mitigation techniques in LLMs, focusing on prevalent methods that involve model development and prompting techniques. Model development branches into various approaches, including new decoding strategies, knowledge graph-based optimizations, the addition of novel loss function components, and supervised fine-tuning. Meanwhile, prompt engineering can involve retrieval augmentation-based methods, feedback-based strategies, or prompt tuning.

2 Hallucination Mitigation

The detection of hallucinations has emerged as a significant concern, given the integral role of generative LLMs in critical tasks. (Qiu et al., 2023b) introduced mFACT as a method to identify hallucination in summaries, extending its applicability beyond English to other languages. Additionally, (Zhang et al., 2023b) proposed a framework for hallucination detection based on contextual information. Another perspective on understanding hallucination causation is presented by (Mündler et al., 2023), who explores self-contradiction as a contributing factor.

2.1 Prompt Engineering

Prompt engineering is the process of experimenting with various instructions to get the best output possible from an AI text generation model (White et al., 2023). In terms of hallucination mitigation, this process can provide specific context and expected outcomes (Feldman et al., 2023). The prompt engineering mitigation techniques can be outlined as follows:

2.1.1 Retrieval Augmented Generation

Retrieval-Augmented Generation (RAG) enhances the responses of LLMs by tapping into external, authoritative knowledge bases rather than relying on potentially outdated training data or the model’s internal knowledge. This approach addresses the key challenges of accuracy and currency in LLM outputs (Kang et al., 2023). RAG effectively mitigates the issue of hallucination in LLMs by generating responses that are not only pertinent and current but also verifiable, thereby reinforcing user confidence and offering developers an economical way to enhance the fidelity and utility of LLMs across different applications. The mitigation techniques following this system can be further categorized as:

2.1.1.1 Before generation

For the following techniques, the information retrieval happens before the generation of AI text: LLM-Augmenter: (Peng et al., 2023) proposes a system that augments a black-box LLM with a set of Plug-And-Play (PnP) (Li et al., 2023b) modules. The system makes the LLM generate responses grounded in external knowledge. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions. In this paper, the authors present LLM-Augmenter to improve LLMs with external knowledge and automated feedback using PnP modules, which do not require any training and can be used instantly. Given a user query, the framework first retrieves evidence from external knowledge and performs reasoning to form evidence chains. Then LLM-Augmenter queries a fixed LLM (GPT-3.5) using a prompt that contains the consolidated evidence for the LLM to generate a candidate response grounded in external knowledge (evidence). LLMAugmenter then verifies the candidate’s response, e.g., by checking whether it hallucinates evidence. If so, LLM-Augmenter generates a feedback message. The message is used to revise the prompt to query GPT-3.5 again. The process iterates until a candidate response passes the verification and is sent to the user. FreshPrompt: (Vu et al., 2023) address the static nature of most LLMs, highlighting their inability to adapt to the evolving world. The authors introduce FreshQA, a dynamic QA benchmark, evaluating LLMs on questions requiring current world knowledge and those with false premises. Through a two-mode evaluation, correctness and hallucination are measured, revealing limitations and the need for improvement, particularly in fastchanging knowledge scenarios. To address these challenges, the authors present FreshPrompt, a fewshot prompting method that leverages a search engine to incorporate relevant and up-to-date information into prompts. FreshPrompt outperforms competing methods and commercial systems, with further analysis emphasizing the impact of the number and order of retrieved evidence on correctness. The work contributes a detailed evaluation of LLM capabilities in adapting to evolving knowledge, introducing the FreshQA dataset and an effective prompting method, FreshPrompt, to enhance dynamic question answering.

2.1.1.2 During generation

The below techniques demonstrate knowledge retrieval at a sentence-by-sentence level, where the model goes through information retrieval while generating each sentence.

Knowledge Retrieval: (Varshney et al., 2023) suggest a method that entails actively detecting and reducing hallucinations as they arise. Before moving on to the creation of sentences, the approach first uses the logit output values from the model to identify possible hallucinations, validate that they are accurate, and then mitigate any hallucinations that are found. The most important realization is that handling hallucinations in the generation process is critical because it raises the probability of producing a sentence with hallucinations when the model has previously experienced hallucinations in its output. This study investigates the use of logit output values – which are produced by models like the GPT-3 and others – in the identification of hallucinations. However, it acknowledges that some models available solely through API calls might not give logit output values and emphasizes that this information is a supplementary source rather than a necessary prerequisite for the hallucination detection approach. The method uses retrieved knowledge as support for the correction phase, instructing the model to repair the phrase by either eliminating or substituting hallucinated information to reduce hallucinations in the created sentence. Decompose and Query framework (D&Q): In their research, (Cao et al., 2023) address challenges faced by LLMs in Question Answering, focusing on hallucinations and difficulties with multi-hop relations. They propose the D&Q framework to guide models in utilizing external knowledge while constraining reasoning to reliable information, thus mitigating the risk of hallucinations. Experimental results demonstrate D&Q’s effectiveness, showcasing competitive performance against GPT-3.5 on ChitChatQA and achieving a noteworthy 59.6% F1 score on HotPotQA (question-only). The framework involves a supervised fine-tuning phase without tool invocation, and during the prediction phase, the model uses external tools to query a reliable question-answer base, allowing for backtracking and initiating new searches if needed. The findings underscore D&Q’s potential to enhance the robustness and performance of LLMs in question-answering tasks. and Rectification Real-time Verification (EVER): LLMs often struggle with the challenge of producing inaccurate or hallucinated content, especially in reasoning tasks. In response to this the authors of issue prevalent in both non-retrieval-based and retrieval-augmented generation approaches, (Kang et al., 2023) introduces the EVER framework. Unlike existing methods that rectify hallucinations post-hoc, EVER employs a real-time, stepwise strategy during the generation process to detect and rectify hallucinations as they occur. The three-stage process involves generation, validation, and rectification, effectively identifying and correcting intrinsic and extrinsic hallucinations. EVER outperforms both retrieval-based and nonretrieval-based baselines, showcasing significant improvements in generating trustworthy and factually accurate text across diverse tasks such as short-form QA, biography generation, and multi-hop reasoning. The framework’s efficacy is empirically validated, demonstrating its ability to mitigate the “snowballing” issue of hallucination, making it a valuable contribution to enhancing the accuracy and reliability of LLMs.

2.1.1.3 After generation

The following techniques employ the information retrieval system after generating the entirety of its output: Retrofit Attribution using Research and Revision (RARR): (Gao et al., 2023) In the realm of LLMs, notable advancements have been achieved across various tasks; however, issues persist, such as generating content without proper support or accuracy. The challenge of determining trustworthiness in LLM outputs, due to a lack of attributability, prompted the introduction of RARR. This modelagnostic system, presented in the introduction, automates the attribution process for any text generation model. Inspired by fact-checking workflows, RARR conducts research and post-editing to align content with retrieved evidence while preserving original qualities, operating seamlessly after LLM generation. Contributions outlined in the introduction encompass formalizing the Editing for Attribution task, introducing new metrics, benchmarking existing revision models, and proposing a research-and-revise model. The conclusion underscores RARR’s ability to enhance attribution while preserving essential text properties, providing a practical solution to bolster the reliability of LLM outputs. High Entropy Word Spotting and Replacement: While the technical feasibility of detecting high entropy words may be apparent, a significant challenge arises due to the closed-source nature of many contemporary LLMs, with subscriptionbased APIs limiting accessibility. The proposed solution by (Rawte et al., 2023) involves utilizing open-source LLMs to identify high entropy words, followed by their replacement using a lower Hallucination Vulnerability Index-based LLM. The results underscore the exceptional performance of albert-large-v2 (Lan et al., 2020) in detecting high entropy words in GPT-3-generated content. Conversely, distilroberta-base (Sanh et al., 2019) exhibits superior performance in replacing high entropy words, leading to a reduction in hallucinations. An integral aspect of this approach is the treatment of consecutive high-entropy words as a unified unit, where these words are collectively masked before replacement, proving particularly effective in addressing hallucinations related to Generated Golem or Acronym Ambiguity.

2.1.1.4 End-to-End RAG

The end-to-end process of RAG proposed in the paper by (Lewis et al., 2021) involves integrating a pre-trained sequence-to-sequence (seq2seq) transformer with a dense vector index of Wikipedia, accessed through the Dense Passage Retriever (DPR). This innovative combination allows the model to condition its output generation on both the input query and latent documents provided by the DPR. In this process, the DPR acts as a neural retriever, supplying relevant documents based on the input. These documents are then used by the seq2seq model, specifically BART, to generate the final output. The model employs a top-K approximation to marginalize these latent documents, which can be done on a per-output basis (assuming one document is responsible for all tokens) or a per-token basis (allowing different documents to influence different parts of the output).

Crucially, both the generator and the retriever in this RAG setup are trained end-to-end, ensuring that they learn jointly and improve each other’s performance. This methodology contrasts with previous approaches that required architectures with non-parametric memory to be built from scratch for specific tasks. Instead, RAG uses pre-trained components, pre-loaded with extensive knowledge, allowing the model to access and integrate a vast range of information without the need for additional training. This end-to-end approach results in enhanced performance on various knowledgeintensive tasks, demonstrating the efficacy of combining parametric and non-parametric memory in generation models.

2.1.2 Self-refinement through feedback and reasoning

After an LLM provides an output for a specific prompt, proper feedback about the output can make the LLM give better and more accurate outputs in its consecutive iterations (Madaan et al., 2023). Abiding by this method, the following are the specific hallucination mitigation techniques: Prompting GPT-3 To Be Reliable: According to (Si et al., 2022)’s paper, LLMs, particularly GPT3, exhibit remarkable few-shot prompting abilities, enhancing their applications in real-world language tasks. Despite this, the issue of improving GPT-3’s reliability remains underexplored. This study decomposes reliability into four crucial facets – generalizability, social biases, calibration, and factuality – and introduces simple and effective prompts to enhance each aspect. The research surpasses smaller-scale supervised models on all reliability metrics, offering practical strategies for improving GPT-3’s performance. The paper outlines previous works on LLM reliability, highlighting the novelty of this study’s comprehensive analysis and focus on effective prompting strategies. Drawing inspiration from ML safety surveys, the reliability framework aligns with identified risks in existing conceptual frameworks. Lastly, the systematic exploration of GPT-3’s reliability has been summarized, which introduces practical prompting strategies, and emphasizes the study’s contribution to insights into LLMs and practical recommendations for GPT-3 users. ChatProtect: (Mündler et al., 2023) focuses on an important type of hallucination called selfcontradiction, which occurs when an LLM generates two logically inconsistent sentences given the same context. They propose a three-step pipeline for reasoning about self-contradictions. Importantly, the approach is built upon prompting strategies, making it applicable to black-box LLMs without requiring external grounded knowledge. They conducted an extensive evaluation targeting four modern instruction-tuned LMs on the task of opendomain text generation, demonstrating the substantial benefits of the approach: it effectively exposes self-contradictions, accurately detects them, and appropriately mitigates their occurrence. Self-Reflection Methodology: The paper (Ji et al., 2023b) explores and addresses the phenomenon of hallucination in medical generative QA systems utilizing widely adopted LLMs and datasets. The focus is on identifying and understanding problematic answers, emphasizing hallucination. To tackle this challenge, the paper introduces an interactive self-reflection methodology that integrates knowledge acquisition and answer generation. Through this iterative feedback process, the approach systematically improves the factuality, consistency, and entailment of generated answers. Leveraging the interactivity and multitasking ability of LLMs, the method produces progressively more precise and accurate answers. Experimental results, both automatic and human evaluations, highlight the effectiveness of this approach in reducing hallucinations compared to baselines. The investigation into hallucinations in generation tasks, particularly in the medical domain, is crucial for AI’s accountability and trustworthiness. The proposed iterative self-reflection method, employing a generate-scorerefine strategy on background knowledge and answers, is empirically proven to be effective, generalizable, and scalable in mitigating hallucinations. Structured Comparative (SC) reasoning: In the realm of text preference prediction, where LLMs often grapple with inconsistencies in reasoning, (Yan et al., 2023) introduces the SC reasoning method. SC employs a prompting approach that predicts text preferences by generating structured intermediate comparisons. It starts by proposing aspects of comparison and then generates textual comparisons under each aspect. Utilizing a pairwise consistency comparator, SC ensures that each aspect’s comparisons distinctly differentiate between texts, effectively reducing hallucination and enhancing consistency. The methodology is showcased across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrating that SC equips LLMs with state-of-the-art performance in text preference prediction. The structured reasoning approach of SC, along with its consistency enforcement, is validated through comprehensive evaluations and ablation studies, emphasizing its effectiveness in improving accuracy and coherence across diverse tasks. Human evaluations further underscore SC’s interpretative capabilities, assisting users in making informed decisions. Mind’s Mirror: While chain-of-thought (CoT) distillation methods show promise for downsizing LLMs to small language models (SLMs), there is a risk of carrying over flawed reasoning and hallucinations. To address this, (Liu et al., 2023) proposed a methodology with two key components: First, a novel approach introduces distilling the selfevaluation capability inherent in LLMs into SLMs, aiming to mitigate adverse effects and reduce hallucinations. Second, a comprehensive distillation process incorporates multiple distinct CoT and selfevaluation paradigms for holistic knowledge transfer into SLMs.

The methodology trains SLMs to possess self-evaluation capabilities, recognizing and correcting hallucinations and unreliable reasoning, enhancing predictive accuracy and reliability on various NLP tasks. Comprehensive experiments demonstrate the superiority of this method across reasoning tasks, offering a promising approach to responsibly downsize LLMs. DRESS: (Chen et al., 2023) propose using natural language feedback (NLF), specifically critique and refinement NLF, to improve alignment with human preferences and interaction capabilities of large vision language models (LVLMs). They generalize conditional reinforcement learning to effectively incorporate non-differentiable NLF by training the model to generate corresponding responses conditioned on the NLF. Experiments show relative improvements in DRESS over prior state-of-theart LVLMs in metrics of helpfulness, honesty, and harmlessness alignment. MixAlign: Despite having accurate reference points, LLMs may disregard them and rely on incorrect references or biases instead. This tendency to hallucinate arises when users ask questions that do not directly align with the retrieved references, lacking detailed knowledge of the stored information. (Zhang et al., 2023b) focus on this knowledge alignment problem and introduce MixAlign, a framework that interacts with both the user and knowledge base to clarify how the user question relates to the stored information. MixAlign uses a language model to achieve automatic knowledge alignment and, if needed, further enhances this alignment through user clarifications. MixAlign focuses on utilizing grounding knowledge for faithful decision-making. In cases of uncertainty or unclear evidence, MixAlign generates a question seeking clarification from the user a process referred to as human-assisted knowledge alignment. Chain-of-Verification (CoVe): (Dhuliawala et al., 2023) develop the CoVe method where the model

  1. Drafts an initial response.
  2. Plans verification questions to fact-check its draft.
  3. Answers those questions independently so the answers are unbiased.
  4. Generates a final verified response.

Experiments show CoVe decreases hallucinations across tasks like list-based Wikidata questions and long-form text generation. Given a user query, an LLM generates a baseline response that may contain inaccuracies like factual hallucinations. CoVe first generates verification questions to ask, then answers them to check for agreement. Chain of Natural Language Inference (CoNLI): (Lei et al., 2023) address the challenge of hallucinations generated by LLMs when provided background context. Despite fluency in natural language generation, LLMs often produce ungrounded hallucinations unsupported by the given sources.

The proposed hierarchical framework focuses on detecting and mitigating such hallucinations without requiring fine-tuning or domain-specific prompts. The framework utilizes Chain of Natural Language Inference (CoNLI) for state-of-the-art hallucination detection by identifying ungrounded content. Post-editing is then used to reduce hallucinations and enhance text quality without model adjustment. Extensive experiments on text-to-text datasets demonstrate effectiveness in both hallucination detection and reduction. By formulating detection as a chain of natural language inference tasks, the framework incorporates sentence and entity-level judgments with interpretability.

The plug-and-play framework allows seamless deployment across contexts with competitive hallucination detection and reduction performance while preserving text quality.

2.1.3 Prompt Tuning

Prompt tuning is a technique that involves adjusting the instructions provided to a pre-trained LLM during the fine-tuning phase to make the model more effective at specific tasks. The LLM learns from ‘Soft Prompts’, which are not predetermined but are instead learned by the model through backpropagation during the fine-tuning (Lester et al., 2021). For hallucination mitigation, the following techniques, which involve prompt tuning, have been proposed as of now: Universal Prompt Retrieval for Improving zeroShot Evaluation (UPRISE): (Cheng et al., 2023) propose UPRISE, which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifically, they demonstrate universality in a cross-task and cross-model scenario: the retriever is tuned on a diverse set of tasks, but tested on unseen type tasks. The retriever is trained to retrieve prompts for multiple tasks, enabling it to generalize to unseen task types during inference. SynTra: Large language models (LLMs) often exhibit hallucination in abstractive summarization tasks, even when the necessary information is present. Addressing this challenge is difficult due to the intricate evaluation of hallucination during optimization. (Jones et al., 2023) introduce SynTra, a method that uses a synthetic task to efficiently reduce hallucination on downstream summarization tasks. SynTra optimizes the LLM’s system message via prefix-tuning on the synthetic task, then transfers this capability to more challenging, realistic summarization tasks. Experiments demonstrate reduced hallucination for two 13B parameter LLMs, highlighting the effectiveness of synthetic data for mitigating undesired behaviors.

3 Developing Models

Some papers focused on developing novel models to mitigate hallucinations. It is an ongoing and evolving process requiring a combination of algorithmic advancements and data quality improvements. Instead of going for fine-tuning models, the following techniques implemented whole model architecture to tackle hallucinations. These techniques can be categorized as follows:

3.1 Introducing new decoding strategy

Decoding strategy generally involves designing techniques that specifically target the generation phase of a model. In terms of hallucination, the techniques aim to reduce the occurrence of hallucinations in the generated outputs by guiding the generation phase towards authentic or context-specific generation (Lango and Dusek, 2023). The following techniques make use of the decoding strategy: Context-Aware Decoding (CAD): (Shi et al., 2023) present CAD, which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. CAD is particularly effective in overriding a model’s prior knowledge when it contradicts the provided context, leading to substantial improvements in tasks where resolving the knowledge conflict is essential. CAD can be used with off-the-shelf pre-trained language models without any additional training. More notably, CAD is especially beneficial for knowledgeconflicting tasks, where the context contains information contradictory to the model’s prior knowledge. The results demonstrate the potential of CAD in mitigating hallucinations in text generation and overriding prior knowledge with reliable and trusted information. Decoding by Contrasting Layers (DoLa): (Chuang et al., 2023) introduce DoLa, a simple decoding strategy designed to mitigate hallucinations in pre-trained LLMs without the need for external knowledge conditioning or additional fine-tuning. DoLa achieves the next-token distribution by contrasting logit differences between later and earlier layers projected into the vocabulary space. This leverages the observed localization of factual knowledge in specific transformer layers. Consequently, DoLa enhances the identification of factual knowledge and minimizes the generation of incorrect facts. Across various tasks, including multiplechoice and open-ended generation tasks like TruthfulQA, DoLa consistently improves truthfulness, enhancing the performance of LLaMA family models. Inference-Time Intervention (ITI): (Li et al., 2023a) introduce ITI, a technique designed to enhance the “truthfulness” of LLMs. ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. The technique first identifies a sparse set of attention heads with high linear probing accuracy for truthfulness. Then, during inference, they shift activations along these truth-correlated directions. It repeats the same intervention autoregressively until the whole answer is generated. ITI results in a significant performance increase on the TruthfulQA benchmark.

3.2 Utilization of Knowledge Graph (KG)

KGs are organized collections of data that include details about entities (i.e., people, places, or objects), their characteristics, and the connections between them (Sun et al., 2023a). It arranges data such that machines can comprehend the relationships and semantic meaning of the material. KGs offer a basis for sophisticated reasoning, data analysis, and information retrieval. Thus, several studies have used KGs in the context of hallucination mitigation (Bayat et al., 2023). They are:

RHO: To handle the hallucination challenge in dialogue response generation, (Ji et al., 2023a) proposes a framework called RHO that utilizes the representations of linked entities and relation predicates from a KG to generate more faithful responses. To improve faithfulness, they introduce local and global knowledge-grounding techniques into dialogue generation and further utilize a conversational reasoning model to re-rank the generated responses. These two knowledge groundings help the model effectively encode and inject the knowledge information from context-related subgraphs with proper attention. Their work improves the fusion and interaction between external knowledge and dialogue context via various knowledge groundings and reasoning techniques, further reducing hallucination. FactuaL Error detection and correction with Evidence Retrieved from external Knowledge (FLEEK): (Bayat et al., 2023) introduce FLEEK, an intelligent and model-agnostic tool aimed at aiding end users, such as human graders, in fact verification and correction. FLEEK features a user-friendly interface capable of autonomously identifying potentially verifiable facts within the input text. It formulates questions for each fact and queries both curated knowledge graphs and the open web to gather evidence. The tool subsequently verifies the correctness of the facts using the acquired evidence and proposes revisions to the original text. The verification process is inherently interpretable, with extracted facts, generated questions, and retrieved evidence directly reflecting the information units contributing to the verification process. For instance, FLEEK would visually highlight verifiable facts with distinct colors indicating their factuality levels, allowing users to interact with clickable highlights that reveal evidence supporting or refuting each claim. Future work includes comprehensive evaluations of FLEEK, testing its compatibility with various LLMs, and subjecting it to a comprehensive benchmark.

3.3 Introducing faithfulness based loss function

Creating a metric to gauge how closely a model’s outputs match input data or ground truth is the task of this section. In this sense, faithfulness describes the model’s capacity to faithfully and properly reflect data from the input without adding errors, omissions, or distortions (Chrysostomou and Aletras, 2021). The following methods portray the use of technique:

Text Hallucination Mitigating (THAM) Framework: (Yoon et al., 2022) introduce the THAM framework for Video-grounded Dialogue. THAM considers the text hallucination problem, which copies input texts for answer generation without the understanding of the question. It mitigates feature-level hallucination effects by introducing information-theoretic regularization. THAM framework incorporates Text Hallucination Regularization (THR) loss derived from the mutual information between the response language model and the proposed hallucination language model. Minimizing THR loss contributes to reducing indiscriminate text copying and boosting dialogue performances. THAM framework incorporates Text Hallucination Regularization loss derived from the proposed information-theoretic text hallucination measurement approach. Loss Weighting Method: (Qiu et al., 2023b) focus on low resource language summarization and develops a novel metric, mFACT to evaluate the faithfulness of non-English summaries, leveraging translation-based transfer from multiple English It is developed from four faithfulness metrics. English faithfulness metrics. They study hallucination in a cross-lingual transfer setting. They apply mFACT to study the faithfulness in summarisation of the recent multilingual LLMs. The proposed metric consists of weighting training samples’ loss based on their faithfulness score. The experiments show that while common cross-lingual transfer methods benefit summarisation performance, they amplify hallucinations compared to monolingual counterparts. To reduce these hallucinations, they adapt several monolingual methods to cross-lingual transfer and propose a new method based on weighting the loss according to the mFACT score of each training example.

3.4 Supervised fine-tuning (SFT)

SFT serves as a vital phase in aligning LLMs for downstream tasks using labeled data. It helps the model follow human commands for specific tasks (Wang et al., 2023; Chung et al., 2022; Iyer et al., 2023; Sun et al., 2023b) and eventually increases the faithfulness of the model’s outputs. In the context of SFT, the quality of the data stands as the most pivotal concern, as it directly determines the fine-tuned model’s performance(Xu et al., 2023; Touvron et al., 2023). During supervised fine-tuning, the LLM’s weights are adjusted based on the gradients from a task-specific loss function that measures the difference between the LLM’s predictions and ground truth labels. This technique has proven particularly effective in enhancing the adaptability of LLMs, enabling them to excel at previously unseen tasks.

Knowledge Injection and Teacher-Student Approaches: (Elaraby et al., 2023) focus on measuring and reducing hallucinations in weaker open-source large language models (LLMs) like BLOOM 7B (Workshop et al., 2022). They introduce HALOCHECK, a lightweight knowledgefree framework to quantify hallucination severity in LLMs. The authors explore techniques like knowledge injection and teacher-student approaches to alleviate hallucinations in low-parameter LLMs. The framework uses sentence-level entailment to quantitatively assess hallucination levels.

Augmented

The work aims to enhance smaller LLM knowledge through Knowledge Injection (KI) by fine-tuning with domain knowledge, without relying on expensive instructions from stronger models. They investigate leveraging a more powerful LLM like GPT-4 to guide weaker LLMs by generating detailed question answers. By assessing hallucination severity, they optimize teacher LLM engagement to reduce the computational costs of relying extensively on large models. This alleviates the need for frequent queries to the teacher model. Hallucination Recitations (HAR): (Köksal et al., 2023) introduce the concept of attribution in LLMs to control information sources and enhance factuality. While existing methods rely on open-book question answering to improve attribution, the challenge arises when factual datasets reward models for recalling pretraining data rather than demonstrating true attribution. To address this, the authors propose HAR, a novel approach utilizing LLM hallucination to create counterfactual datasets and enhance attribution. Through a case study on open book QA, specifically CF-TriviaQA, the results demonstrate that models fine-tuned with these counterfactual datasets significantly improve text grounding and outperform those trained on factual datasets, even with smaller training datasets and model sizes. The observed improvements are consistent across various open-book QA tasks, including multi-hop, biomedical, and adversarial questions.

Fine-tuning Language Models for Factuality: (Tian et al., 2023) address hallucination by leveraging recent NLP innovations, employing automated fact-checking methods and preferencebased learning through the Direct Preference Optimization algorithm. The researchers fine-tune the Llama-2 model for factuality without human reductions, labeling, achieving notable error particularly in biographies and medical questions. Their approach involves reference-based and reference-free truthfulness evaluations, demonstrating a cost-effective way to enhance model factuality in long-form text generation. The study proposes new benchmark tasks, discusses future avenues, and highlights the potential scalability of factual reinforcement learning for larger models in safety-critical domains. BEINFO: To mitigate the issue and increase faithfulness of information-seeking dialogue systems, (Razumovskaia et al., 2023) introduce BEINFO, a simple yet effective method that applies behavioral tuning to aid information-seeking dialogue. In this work, the authors propose BEINFO, a simple yet effective method that applies ‘behavioral fine-tuning’ to increase the faithfulness of the generated responses information-seeking for dialogue. The model is tuned on a large collection of dialogues with the true knowledge source(s) extended with randomly sampled facts from a large knowledge base. Refusal-Aware Instruction Tuning (R-Tuning): In their recent work, (Zhang et al., 2023a) present a novel approach called R-Tuning for instilling refusal skills in large language models (LLMs). This approach formalizes the idea of identifying knowledge gaps between an LLM’s parametric knowledge and the instructional tuning data used to train it. Based on this knowledge gap, R-Tuning constructs refusal-aware training data to teach the LLM when to refrain from responding, specifically when a question falls outside its competence. The R-Tuning methodology involves two key steps:

  1. Measuring the knowledge gap between the LLM’s parametric knowledge and the instructional tuning questions, to identify uncertain questions. By inferring on the training data once and comparing predictions to labels, the tuning data is separated into uncertain questions and certain questions.

  2. Constructing refusal-aware training data by appending refusal expressions to uncertain training examples, before fine-tuning the LLM on this data.

Think While Effectively Articulating Knowledge (TWEAK): To reduce hallucinations, (Qiu et al., 2023a) propose a new decoding method called TWEAK. The method treats the generated sequences at each step and their future sequences as hypotheses. It ranks each generation candidate based on how well their corresponding hypotheses support the input facts, using a Hypothesis Verification Model (HVM).

The authors tweak only the decoding process without retraining the generative models. This makes their approach easily integrated with any knowledge-to-text generator. Existing decoding methods like beam search sample candidates only based on predicted likelihood, without considering faithfulness. The authors propose a new dataset called FATE, which aligns input facts with original and counterfactual descriptions at the word level.

4 Conclusion

This survey paper delves into the critical issue of hallucination in LLMs, emphasizing the widespread impact of LLMs across various domains in our lives. The paper highlights the challenge posed by LLMs generating incorrect information and identifies it as a significant concern for researchers working on prominent LLMs like GPT-4. The paper explores recent advancements in the detection of hallucinations, with methods such as mFACT, contextual information-based frameworks, and the investigation of self-contradiction as a contributing factor. It underscores the importance of addressing hallucination in LLMs due to their integral role in critical tasks. The central contribution of the paper lies in presenting a systematic taxonomy for categorizing hallucination mitigation techniques in LLMs, extending its coverage to VLMs. By synthesizing essential features characterizing these techniques, the paper provides a foundation for more structured future research within the domain of hallucination mitigation. Additionally, the paper deliberates on the inherent limitations and challenges associated with these techniques, proposing directions for future research in this area.

In essence, this survey paper not only sheds light on the gravity of hallucination in LLMs but also consolidates and organizes diverse mitigation techniques, contributing to the advancement of knowl-

edge in the field of computational linguistics. It serves as a valuable resource for researchers and practitioners seeking a comprehensive understanding of the current landscape of hallucination in LLMs and the strategies employed to address this pressing issue.

5 Discussion and Limitations

Hallucination mitigation in LLMs represents a multifaceted challenge addressed through a spectrum of innovative techniques. The methodologies discussed, ranging from post-generation refinement to supervised fine-tuning, underscore the gravity of the hallucination issue and the pressing need for comprehensive solutions.

In the realm of post-generation refinement, RARR stands out, automating the attribution process and aligning content with retrieved evidence. High Entropy Word Spotting and Replacement tackles hallucinations induced by high-entropy words in LLM-generated content, showcasing the significance of context-aware replacements.

Self-refinement through feedback and reasoning brings forth impactful strategies like ChatProtect, focusing on self-contradiction detection, and Self-Reflection Methodology, employing an iterative feedback process for hallucination reduction in medical generative QA systems. Structured Comparative reasoning introduces a structured approach to text preference prediction, enhancing coherence and reducing hallucination.

Prompt tuning emerges as a powerful technique, with innovations like UPRISE demonstrating the versatility of prompt-based adjustments. SynTra introduces synthetic tasks for mitigating hallucinations in abstractive summarization, offering scalability but raising questions about effectiveness compared to human feedback.

The development of novel models emphasizes decoding strategies such as CAD and DoLa, both instrumental in reducing hallucinations by guiding the generation phase. KG utilization and faithfulness-based loss functions also play crucial roles, as seen in methods like RHO and THAM Framework.

Supervised fine-tuning, a pivotal phase, is explored through various lenses, such as Knowledge Injection and Teacher-Student Approaches, where domain-specific knowledge is injected into weaker LLMs and approaches like HAR employ counterfactual datasets for improved factuality.

Future developments and improvements in a variety of areas are anticipated for language models’ approach to hallucination mitigation. The creation of hybrid models, which offer a thorough defense against hallucinations by seamlessly integrating numerous mitigation approaches, is one important direction. By reducing reliance on labeled data, investigating the possibilities of unsupervised or weakly supervised learning techniques might improve scalability and flexibility. In addition, it will be essential to look into the moral ramifications and societal effects of hallucination mitigation strategies to guarantee responsible implementation and promote user confidence. Research on designs specifically intended to reduce hallucinations is further encouraged by the changing field of LLMs, which could lead to the development of new models with built-in safety features. It will be crucial for researchers, business professionals, and ethicists to work together continuously to improve methods, benchmark models, and set standards that put user comprehension and authenticity first. The building of language models that produce coherent and contextually relevant information while simultaneously demonstrating heightened awareness and mitigation of hallucinatory outputs is the field’s collective goal as it navigates these future possibilities.

The collected works on hallucination mitigation reveal a diverse array of strategies, each contributing uniquely to address the nuances of hallucination in LLMs. As the field evolves, the synthesis of these approaches could pave the way for more robust and universally applicable solutions, fostering trust and reliability in language generation systems. Finally, the division of the mitigation techniques surveyed can be easily comprehensible through table 1.

Previous: WikiChat Next: False Promise

post contain ""

    No matching posts found containing ""