benchmark
NuclearQAv2
benchmarkactiveprovisional
nuclearqav2-58964ebc·1 events·first seen 2d agoAliases: NuclearQAv2
More like this (12)
Recent events (1)
NuclearQAv2: A benchmark for evaluating LLM competence in nuclear engineering
Researchers introduce NuclearQAv2, a ~1,240 question benchmark for assessing LLM performance on nuclear engineering knowledge across boolean, numeric, and verbal question types. The benchmark is constructed via a hybrid pipeline combining expert-authored questions, existing datasets, and LLM-assisted generation from domain-specific corpora. Evaluation of multiple LLMs reveals strong performance on factual recall but significant gaps in quantitative reasoning and conceptual understanding, highlighting the need for multi-faceted domain-specific evaluation.