Almanac
benchmark

NuclearQAv2

benchmarkactiveprovisionalnuclearqav2-58964ebc·1 events·first seen 2d ago

Aliases: NuclearQAv2

More like this (12)

Recent events (1)

3arXiv · cs.CL·2d ago·source ↗

NuclearQAv2: A benchmark for evaluating LLM competence in nuclear engineering

Researchers introduce NuclearQAv2, a ~1,240 question benchmark for assessing LLM performance on nuclear engineering knowledge across boolean, numeric, and verbal question types. The benchmark is constructed via a hybrid pipeline combining expert-authored questions, existing datasets, and LLM-assisted generation from domain-specific corpora. Evaluation of multiple LLMs reveals strong performance on factual recall but significant gaps in quantitative reasoning and conceptual understanding, highlighting the need for multi-faceted domain-specific evaluation.