benchmark
CyberSecEval 2
benchmarkactive
cyberseceval-2-c471014b·1 events·first seen 28d agoAliases: CyberSecEval 2
Co-occurring entities
More like this (12)
Recent events (1)
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
CyberSecEval 2 is a benchmark framework designed to evaluate both the cybersecurity risks and capabilities of large language models. The framework appears to be hosted or featured on Hugging Face's leaderboard infrastructure, extending prior cybersecurity evaluation work. It assesses LLMs across multiple dimensions of security-relevant behavior, including potential for misuse and defensive capabilities.