Entity · technique

BenHalluScore

techniqueactivebenhalluscore-40fdfe4f·1 events·first seen Jun 1, 2026

Aliases: BenHalluScore

Co-occurring entities

chain-of-thought prompting Bengali BenHalluEval GPT-5.5

More like this (12)

BenHalluEval ClinHallu PsyScore LegalHalluLens HalluTruthQA BERTScore spoiler-score detector Hallucinations Leaderboard FactScore BashArena SorryBench Vectara Hallucination Leaderboard

Recent events (1)

4arXiv · cs.CL·Jun 1, 2026·source ↗

BenHalluEval: Multi-Task Hallucination Evaluation Framework for Bengali LLMs

BenHalluEval introduces the first systematic hallucination benchmark for Bengali, covering four tasks (generative QA, code-mixed QA, summarization, reasoning) with 12,000 hallucinated candidates generated via GPT-5.4 across twelve hallucination types. Seven LLMs are evaluated under a dual-track protocol separating false-positive rate on ground-truth instances from hallucination detection rate on hallucinated candidates. The proposed BenHalluScore metric reveals substantial variation (7.72%–55.42%) across models and tasks, and chain-of-thought prompting is found to shift response distributions without consistently improving hallucination discrimination. The work highlights gaps in low-resource language hallucination evaluation and critiques single-track and prompting-only evaluation approaches.

Evaluation and Benchmarking BenHalluScore chain-of-thought prompting Bengali +2 more