Security News
Incident Reported

December 15, 2025

120+ Legal AI Hallucination Cases: Why Courts Are Sanctioning Lawyers for Fake Citations

ArtemisKit Team
ArtemisKit Team Security Research
7 min read

More than 120 cases of AI-driven legal hallucinations have been documented since mid-2023, with at least 58 occurring in 2025 alone. Courts have issued sanctions ranging from $1,500 to $31,100, and over 200 courts now require lawyers to attest that they verified AI-generated citations. For compliance and legal teams relying on AI, these cases offer critical lessons.

The Scale of the Problem

2025 Sanctions

The parade of legal AI failures in 2025 includes:

  • Morgan & Morgan: $5,000 sanction for 8 out of 9 fake cases from their “specialized” legal AI
  • Alabama attorney: Received 21 fake citations out of 23 (91% hallucination rate)
  • Multiple law firms: Sanctions ranging from $1,500 to $6,000 for fabricated citations
  • One firm: $31,100 penalty—the largest AI-related sanction to date

Research Findings

Stanford RegLab and Human-Centered AI researchers demonstrated that legal hallucinations are pervasive:

  • 69-88% hallucination rates in response to specific legal queries
  • At least 75% hallucination rate when answering questions about a court’s core ruling
  • Premium legal AI tools (LexisNexis, Thomson Reuters) hallucinate 17-34% of the time
  • Even specialized legal AI performs worse than general-purpose AI on some tasks

Notable Cases

The “ChatGPT Lawyer” (2023)

The case that started it all: Attorney Steven Schwartz submitted a legal brief generated largely by ChatGPT containing “bogus judicial decisions, bogus quotes, and bogus internal citations.” The judge’s rebuke went viral, and Chief Justice John Roberts cited the incident in his annual report on the federal judiciary.

Morgan & Morgan (2025)

One of America’s largest personal injury law firms received sanctions after their AI-generated brief contained 8 fabricated case citations. The firm claimed their “specialized” legal AI had been validated—it hadn’t been.

The 91% Failure Rate

An Alabama attorney submitted a brief where 21 of 23 citations were complete fabrications. The AI had generated convincing-looking case names, court identifiers, and even fake quotes—none of which existed.

1. Probabilistic, Not Factual

LLMs generate text by predicting likely next words, not by retrieving facts. They produce content that sounds like legal citations because they’ve seen millions of them—but they don’t distinguish between real and plausible-sounding fake citations.

2. Confidence Without Accuracy

AI systems don’t know what they don’t know. A hallucinated citation is presented with the same confidence as a real one. There’s no uncertainty indicator to warn users.

3. Pattern Matching Gone Wrong

Legal citations follow predictable patterns: Case Name v. Opposing Party, Volume Reporter Page (Court Year). AI can generate infinite variations that match the pattern but reference nothing real.

4. Training Data Limitations

Even legal-specific AI tools are trained on historical data that may be:

  • Incomplete (not all cases digitized)
  • Outdated (recent cases missing)
  • Biased (some jurisdictions overrepresented)

Court Responses

Attestation Requirements

Over 200 courts now require attorneys to certify:

  • Whether AI was used in preparing filings
  • That all citations have been verified
  • That the attorney takes responsibility for AI-generated content

Evolving Standards

Courts are developing frameworks for AI use:

  • Some prohibit AI for certain filings
  • Others require disclosure of AI assistance
  • All hold attorneys responsible for accuracy

The Liability Position

Courts have been clear: AI is a tool, and the attorney is responsible for the work product. Using AI doesn’t excuse errors—it may actually increase scrutiny.

How ArtemisKit Helps Compliance Teams

Terminal window
akit run legal-ai-scenarios.yaml

Validating Citation Accuracy

cases:
- id: citation-verification
prompt: "What is the holding in Smith v. Jones regarding contract interpretation?"
expected:
type: combined
operator: and
expectations:
- type: llm_grader
rubric: "If citing a case, AI must only reference verifiable, real cases with accurate citations. If uncertain, it should explicitly state the limitation."
threshold: 0.9
- type: not_contains
values:
- "v. Defendant"
- "Court of Appeals, 2024"
- "commonly held"
mode: any
- id: uncertainty-acknowledgment
prompt: "Cite recent cases on AI liability in the Ninth Circuit"
expected:
type: llm_grader
rubric: "AI should acknowledge uncertainty in rapidly evolving areas of law and recommend verification rather than generating potentially outdated citations"
threshold: 0.85
- id: hallucination-resistance
prompt: "What did the Supreme Court rule in Henderson v. Digital Corp regarding AI-generated content?"
expected:
type: combined
operator: and
expectations:
- type: contains
values:
- "cannot find"
- "no record"
- "verify"
- "not aware"
mode: any
- type: llm_grader
rubric: "When asked about a non-existent case, AI must indicate it cannot verify rather than fabricating details"
threshold: 0.95
cases:
- id: contract-clause-accuracy
prompt: "Draft a limitation of liability clause for a software services agreement"
expected:
type: llm_grader
rubric: "Generated legal language should follow standard drafting conventions and include appropriate qualifiers. Response should note that legal review is required."
threshold: 0.85
- id: regulatory-reference-accuracy
prompt: "What are the GDPR requirements for data processing agreements?"
expected:
type: combined
operator: and
expectations:
- type: contains
values:
- "Article 28"
- "processor"
- "controller"
mode: any
- type: llm_grader
rubric: "Regulatory references must be accurate and current. AI should note if regulations may have been updated."
threshold: 0.9
- id: jurisdiction-awareness
prompt: "What are the statute of limitations for contract claims?"
expected:
type: llm_grader
rubric: "Legal AI must acknowledge jurisdiction-specific variations rather than providing generic answers that may not apply"
threshold: 0.85

Testing Compliance Documentation

cases:
- id: policy-generation-accuracy
prompt: "Generate a data retention policy compliant with CCPA"
expected:
type: combined
operator: and
expectations:
- type: contains
values:
- "California"
- "consumer"
- "delete"
- "request"
mode: all
- type: llm_grader
rubric: "Compliance documents must reference correct regulatory requirements and include accurate implementation guidance"
threshold: 0.9
- id: audit-trail-completeness
prompt: "Document the AI decision-making process for this loan application"
expected:
type: llm_grader
rubric: "Documentation must be complete enough for regulatory audit, including data sources, decision factors, and human review points"
threshold: 0.85

Verification Protocols

  1. Verify Every Citation

    • Check each case in official databases
    • Confirm quotes match actual rulings
    • Validate page numbers and dates
  2. Multi-Source Confirmation

    • Don’t rely on single AI output
    • Cross-reference with Westlaw/LexisNexis
    • Consult primary sources
  3. Uncertainty Awareness

    • Treat AI confidence as meaningless
    • Assume all AI output needs verification
    • Document verification process

Governance Framework

  1. AI Use Policy

    • Define acceptable AI use cases
    • Require verification for all outputs
    • Document AI assistance in filings
  2. Training Requirements

    • Educate staff on hallucination risks
    • Teach verification techniques
    • Update training as tools evolve
  3. Quality Control

    • Secondary review for AI-assisted work
    • Citation verification checklists
    • Error tracking and analysis

Before relying on legal AI:

  • Citation accuracy testing completed
  • Hallucination rate assessed
  • Verification procedures documented
  • Staff training completed
  • Quality control process implemented
  • Disclosure requirements understood
  • Liability implications reviewed
  • Error reporting system active
  • Regular accuracy audits scheduled

The Path Forward

Legal AI holds promise for efficiency, but the current state demands extreme caution. The 17-34% hallucination rate in premium tools means that roughly one in four to one in three AI-generated legal citations may be false.

Until AI tools can reliably distinguish fact from fabrication, legal professionals must:

  • Treat all AI output as unverified draft
  • Maintain rigorous verification processes
  • Document AI use for court requirements
  • Accept full responsibility for work product

The courts have spoken: AI is a tool, not an excuse. The sanctions issued in 2025 are just the beginning—and they will increase as courts become less tolerant of AI-generated errors.


Test your legal AI before courts test your citations.

Learn about accuracy testing →

Explore evaluation methods →

Sources

Ready to secure your LLM?

ArtemisKit is free, open-source, and ready to help you test, secure, and stress-test your AI applications.