Fatskills
Practice. Master. Repeat.
Study Guide: AI Literacy: Hallucinations and factual verification
Source: https://www.fatskills.com/ai-for-work/chapter/ai-ai-literacy-hallucinations-and-factual-verification

AI Literacy: Hallucinations and factual verification

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~4 min read

Hallucinations and Factual Verification: Study Guide

What This Is

Hallucinations occur when an AI model generates plausible but false or unsupported information with high confidence. In professional work, this can lead to misinformation, legal risks, or flawed decision-making. For example, a legal AI summarizing a contract might invent a non-existent clause, causing a compliance breach if not verified.


Key Facts & Principles

  • Hallucination: When a model outputs false or fabricated information as if it were factual. Example: An AI claims a 2023 study proves a drug’s efficacy, but no such study exists.
  • Confidence-Accuracy: Models often sound authoritative even when wrong. Example: A chatbot states, “The merger closes on June 1,” but the actual date is July 15.
  • Source Attribution: Requiring the model to cite sources reduces hallucinations. Example: “Summarize the report and include page numbers for each claim.”
  • Retrieval-Augmented Generation (RAG): Combines AI with a search tool to pull facts from trusted documents. Example: A customer service bot answers using only verified product manuals.
  • Ground Truth: A reference dataset or human-verified facts used to check AI outputs. Example: A financial analyst cross-checks AI-generated earnings summaries against SEC filings.
  • Prompt Sensitivity: Small changes in wording can trigger hallucinations. Example: “What are the risks of X?” vs. “List the documented risks of X in the 2023 report.”
  • Domain-Specific Models: Fine-tuned models hallucinate less in their trained domain. Example: A medical AI trained on clinical guidelines is less likely to invent treatments.
  • Human-in-the-Loop (HITL): Requiring human review for critical outputs. Example: A compliance team flags AI-generated policy summaries before distribution.

Step-by-Step Application

  1. Define the Task’s Risk Level
  2. Low-risk (e.g., brainstorming): Accept some hallucinations.
  3. High-risk (e.g., legal, medical, financial): Use RAG, source citations, and HITL.

  4. Design Hallucination-Resistant Prompts

  5. Add constraints: “Answer only using the attached document. If unsure, say ‘I don’t know.’”
  6. Ask for sources: “Cite the exact page number for each claim.”

  7. Set Up a Verification Workflow

  8. For RAG: Upload trusted documents (e.g., company policies, research papers) and query them.
  9. For non-RAG: Use a second AI or human to fact-check outputs against primary sources.

  10. Evaluate Outputs Systematically

  11. Check for:

    • Plausibility: Does this align with known facts? (e.g., “The sky is green”-obvious hallucination.)
    • Consistency: Does the AI contradict itself in the same response?
    • Source Alignment: Do cited sources actually support the claim?
  12. Implement Governance Rules

  13. Rule 1: Never use AI-generated content for public statements without verification.
  14. Rule 2: Log and audit high-risk AI outputs (e.g., legal advice, financial reports).

  15. Iterate Based on Feedback

  16. Track hallucination rates (e.g., % of outputs flagged as false) and adjust prompts or tools.

Common Mistakes

  • Mistake: Assuming longer responses are more accurate. Correction: Longer outputs often contain more hallucinations. Use concise prompts and ask for bullet points.

  • Mistake: Trusting AI summaries of complex documents without checking the original. Correction: Always cross-reference with the source. Example: An AI summarizes a 50-page contract in 3 bullet points—verify each against the text.

  • Mistake: Using generic models for specialized tasks (e.g., medical diagnosis). Correction: Use domain-specific models or fine-tune a general model on trusted data.

  • Mistake: Ignoring “I don’t know” responses. Correction: Treat them as red flags—either the prompt is unclear or the model lacks data. Rephrase or provide context.

  • Mistake: Over-relying on AI for dynamic or rapidly changing data (e.g., stock prices, news). Correction: Use APIs or live databases for real-time data; use AI only for analysis.


Practical Tips

  • Tip 1: For critical tasks, use two-step verification: First, ask the AI to generate a draft; second, ask it to critique its own draft for errors.
  • Tip 2: Build a “hallucination cheat sheet” for your team with common falsehoods in your domain (e.g., “AI often misstates our company’s founding year as 2015 instead of 2010”).
  • Tip 3: Log and analyze hallucinations to identify patterns (e.g., “The model hallucinates most on questions about competitors’ unreleased products”).
  • Tip 4: Train non-technical teams to spot red flags (e.g., overly confident language, missing sources, or implausible claims).

Quick Practice Scenario

Scenario: Your team uses an AI to draft press releases. The AI writes: “Our new product reduces energy costs by 40%, as proven by a 2024 study from MIT.” The marketing lead wants to publish this immediately.

Question: What’s your next step, and why?

Answer: Verify the MIT study exists by searching MIT’s database or asking the AI for a link. Why: The claim is specific and high-stakes—hallucinating a study could damage credibility.


Last-Minute Cram Sheet

  1. Hallucination: AI confidently states false info. Not always obvious—check sources.
  2. RAG: AI + search = fewer hallucinations. Use for high-risk tasks.
  3. Prompt trick: “Answer only if you’re 100% sure. Cite sources.”
  4. Confidence-accuracy: Models sound sure even when wrong.
  5. HITL: Human review for critical outputs (e.g., legal, medical).
  6. Ground truth: Always compare AI outputs to primary sources.
  7. Domain models: Less hallucination in their specialty (e.g., medical AI).
  8. Dynamic data: Don’t trust AI for live info (e.g., stock prices). Use APIs.
  9. Audit logs: Track hallucinations to improve prompts/tools.
  10. Red flags: Overly specific claims, missing sources, or contradictions.