Almanac
dataset

CUAD

datasetactiveprovisionalcuad-d5ddf8f3·1 events·first seen 4h ago

Aliases: CUAD

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·4h ago·source ↗

LegalHalluLens: Typed hallucination auditing and calibrated multi-agent debate for legal AI

Researchers introduce LegalHalluLens, an auditing framework for hallucination in legal AI systems, evaluated across 510 contracts and 249,252 clause-level instances from the CUAD dataset. The framework introduces typed hallucination profiles across four claim categories (numeric, temporal, obligation/entitlement, factual) and a Risk Direction Index (RDI) that distinguishes omission from invention errors. A calibrated multi-agent debate pipeline reduces fabricated detections by 45% using a 4B-parameter model competitive with commercial APIs. The work reveals that aggregate hallucination rates (~52%) mask a 38-40 percentage-point gap between claim types and that two systems with identical aggregate rates can have opposite risk profiles.