BenHalluScore
benhalluscore-40fdfe4f·1 events·first seen 16d agoAliases: BenHalluScore
Co-occurring entities
More like this (12)
Recent events (1)
BenHalluEval: Multi-Task Hallucination Evaluation Framework for Bengali LLMs
BenHalluEval introduces the first systematic hallucination benchmark for Bengali, covering four tasks (generative QA, code-mixed QA, summarization, reasoning) with 12,000 hallucinated candidates generated via GPT-5.4 across twelve hallucination types. Seven LLMs are evaluated under a dual-track protocol separating false-positive rate on ground-truth instances from hallucination detection rate on hallucinated candidates. The proposed BenHalluScore metric reveals substantial variation (7.72%–55.42%) across models and tasks, and chain-of-thought prompting is found to shift response distributions without consistently improving hallucination discrimination. The work highlights gaps in low-resource language hallucination evaluation and critiques single-track and prompting-only evaluation approaches.