Multi-source cybersecurity log dataset with ATT&CK labels and SLM fine-tuning evaluation
Researchers introduce a new multi-source cybersecurity log dataset of 870 sessions (~2.3M events) capturing system, network, and browser activity on Windows endpoints, with per-entry MITRE ATT&CK technique labels across 12 tactics and 53 techniques. The dataset addresses gaps in existing public datasets (CICIDS, UNSW-NB15, ATLAS) that lack combined multi-source coverage with fine-grained ATT&CK labeling. Three small language models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) were fine-tuned with LoRA on the dataset, achieving chunk classification accuracy of 90–97% versus ~8% for base variants, though ATT&CK technique identification remained harder at 42% exact-match accuracy.
Related guides (3)
Related events (8)
Anthropic maps 832 AI-enabled cyberattacks, finds MITRE ATT&CK framework inadequate for agentic threats
Anthropic's Frontier Red Team analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026, mapping their techniques against the MITRE ATT&CK framework. Key findings: medium-or-higher-risk actors grew from 33% to 56% across the study period; AI use is shifting from initial-access techniques toward post-compromise operations like lateral movement and privilege escalation; and traditional risk signals (technique count, platform used) no longer reliably distinguish threat levels. The report concludes that MITRE ATT&CK lacks coverage for agentic orchestration behaviors—where AI chains attack stages autonomously with minimal human input—which characterize the highest-risk actors, including a state-sponsored espionage operation disrupted in November 2025.
CWE-Trace framework reveals LLM vulnerability detection is calibration without comprehension
Researchers introduce CWE-Trace, a benchmark of 834 manually curated Linux kernel samples across 74 CWEs with strict temporal splits to prevent data contamination, used to evaluate 8 vanilla LLMs and 15 LoRA fine-tuned variants on vulnerability detection. Key findings: data contamination provides no measurable advantage (84% of nominally contaminated samples carry no usable memorization signal), and backbone directional priors dominate fine-tuning — models exhibit stable systematic failure modes that resist correction. The best binary detection score reaches only 52.1% (barely above chance) and exact CWE classification Top-1 accuracy stays below 1.3%, indicating fine-tuning shifts output distributions without instilling genuine security reasoning. The work introduces two diagnostic metrics (Directional Failure Index and Hierarchical Distance and Direction) and concludes that detection capability and security understanding are decoupled in current LLMs.
Anthropic-Cybersecurity-Skills: 754 Structured Cybersecurity Skills for AI Agents
A GitHub repository providing 754 structured cybersecurity skills designed for AI coding agents, mapped to five major frameworks including MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, and NIST AI RMF. The skills are organized across 26 security domains and conform to the agentskills.io standard. The project claims compatibility with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI, and 20+ other platforms. It has accumulated 7,330 stars with 238 added today, indicating notable community traction.
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
CyberSecEval 2 is a benchmark framework designed to evaluate both the cybersecurity risks and capabilities of large language models. The framework appears to be hosted or featured on Hugging Face's leaderboard infrastructure, extending prior cybersecurity evaluation work. It assesses LLMs across multiple dimensions of security-relevant behavior, including potential for misuse and defensive capabilities.
Stateful Online Monitoring Catches Distributed Agent Attacks via Cross-Account Clustering
Researchers demonstrate the first known distributed agent attack, a multi-agent scaffold that splits harmful cybersecurity tasks across many user accounts to evade per-transcript safety monitors, reducing detection rates to roughly one-fifth of standard attacks. As a defense, they develop a stateful online monitor that clusters weak suspiciousness signals across many agent transcripts in real time, escalating only rarely to a full LM-based review. In large-scale simulated datacenter traffic evaluations, the monitor Pareto-dominates standard monitors by catching distributed attacks 30% earlier with negligible latency overhead for ~99% of traffic. The system also incidentally catches standard jailbreaks, since adaptive attackers tend to reuse attack variants across accounts.
CATCH-ME dataset: multilingual multi-turn counterspeech against hate speech and misinformation for RAG systems
Researchers introduce CATCH-ME, a large-scale expert-curated multilingual dataset of multi-turn dialogues addressing the intersection of hate speech and misinformation across five languages and seven marginalized groups. The dataset is anchored in verified external knowledge (fact-checking articles and NGO reports) with document- and chunk-level span annotations, making it directly usable for RAG-based counterspeech systems. It addresses a gap in existing resources, which are limited to single-turn English dialogues, and is intended to improve the factual grounding and persuasiveness of LLM-generated counterspeech.
LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models
LASH is a black-box jailbreak framework that adaptively composes outputs from multiple existing attack families into hybrid prompts using a genetic optimizer with a two-stage fitness function. Evaluated on JailbreakBench across six target models, LASH achieves 84.5% attack success rate (keyword-based) and 74.5% (LLM-judge) with only 30 mean target queries, outperforming five state-of-the-art baselines. The work demonstrates that no single jailbreak family dominates across models and harm categories, and that adaptive cross-strategy composition is a promising red-teaming direction. Results hold under three defense mechanisms.
Practitioner spends $1,500 testing LLM offensive security capabilities against a purpose-built vulnerable app
A developer built a deliberately vulnerable application and ran LLMs against it as automated penetration testers, spending $1,500 on API costs across the experiment. The post evaluates how well current LLMs can identify and exploit real vulnerabilities in a controlled setting. Results provide practical signal on the current state of LLM-assisted offensive security, a capability area with both red-team and safety implications.


