5arXiv cs.LG (Machine Learning)·3d ago

Multi-source cybersecurity log dataset with ATT&CK labels and SLM fine-tuning evaluation

Researchers introduce a new multi-source cybersecurity log dataset of 870 sessions (~2.3M events) capturing system, network, and browser activity on Windows endpoints, with per-entry MITRE ATT&CK technique labels across 12 tactics and 53 techniques. The dataset addresses gaps in existing public datasets (CICIDS, UNSW-NB15, ATLAS) that lack combined multi-source coverage with fine-grained ATT&CK labeling. Three small language models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) were fine-tuned with LoRA on the dataset, achieving chunk classification accuracy of 90–97% versus ~8% for base variants, though ATT&CK technique identification remained harder at 42% exact-match accuracy.

Evaluation and Benchmarking AI Safety Research Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation CICIDS Llama 3.2 LoRA Phi-4-mini Qwen2.5-1.5B MITRE ATT&CK UNSW-NB15 Atlas

Related guides (3)

LoRAConcept

LoRA: How to Teach a Giant AI New Tricks Without Rebuilding It

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

8Anthropic News·17d ago·source ↗

Anthropic maps 832 AI-enabled cyberattacks, finds MITRE ATT&CK framework inadequate for agentic threats

Anthropic's Frontier Red Team analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026, mapping their techniques against the MITRE ATT&CK framework. Key findings: medium-or-higher-risk actors grew from 33% to 56% across the study period; AI use is shifting from initial-access techniques toward post-compromise operations like lateral movement and privilege escalation; and traditional risk signals (technique count, platform used) no longer reliably distinguish threat levels. The report concludes that MITRE ATT&CK lacks coverage for agentic orchestration behaviors—where AI chains attack stages autonomously with minimal human input—which characterize the highest-risk actors, including a state-sponsored espionage operation disrupted in November 2025.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team MITRE ATT&CK Claude Code +3 more

6arXiv · cs.AI·46h ago·source ↗

CWE-Trace framework reveals LLM vulnerability detection is calibration without comprehension

Researchers introduce CWE-Trace, a benchmark of 834 manually curated Linux kernel samples across 74 CWEs with strict temporal splits to prevent data contamination, used to evaluate 8 vanilla LLMs and 15 LoRA fine-tuned variants on vulnerability detection. Key findings: data contamination provides no measurable advantage (84% of nominally contaminated samples carry no usable memorization signal), and backbone directional priors dominate fine-tuning — models exhibit stable systematic failure modes that resist correction. The best binary detection score reaches only 52.1% (barely above chance) and exact CWE classification Top-1 accuracy stays below 1.3%, indicating fine-tuning shifts output distributions without instilling genuine security reasoning. The work introduces two diagnostic metrics (Directional Failure Index and Hierarchical Distance and Direction) and concludes that detection capability and security understanding are decoupled in current LLMs.

Evaluation and Benchmarking AI Safety Research CWE-Trace Hierarchical Distance and Direction DeepSeek V4 +3 more

4Github Trending·28d ago·source ↗

Anthropic-Cybersecurity-Skills: 754 Structured Cybersecurity Skills for AI Agents

A GitHub repository providing 754 structured cybersecurity skills designed for AI coding agents, mapped to five major frameworks including MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, and NIST AI RMF. The skills are organized across 26 security domains and conform to the agentskills.io standard. The project claims compatibility with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI, and 20+ other platforms. It has accumulated 7,330 stars with 238 added today, indicating notable community traction.

AI Safety Research Agent and Tool Ecosystem NIST AI RMF agentskills.io mukul975/Anthropic-Cybersecurity-Skills +6 more

5Hugging Face Blog·1mo ago·source ↗

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 is a benchmark framework designed to evaluate both the cybersecurity risks and capabilities of large language models. The framework appears to be hosted or featured on Hugging Face's leaderboard infrastructure, extending prior cybersecurity evaluation work. It assesses LLMs across multiple dimensions of security-relevant behavior, including potential for misuse and defensive capabilities.

Evaluation and Benchmarking AI Safety Research CyberSecEval 2 LlamaGuard Hugging Face +1 more

7arXiv · cs.AI·19d ago·source ↗

Stateful Online Monitoring Catches Distributed Agent Attacks via Cross-Account Clustering

Researchers demonstrate the first known distributed agent attack, a multi-agent scaffold that splits harmful cybersecurity tasks across many user accounts to evade per-transcript safety monitors, reducing detection rates to roughly one-fifth of standard attacks. As a defense, they develop a stateful online monitor that clusters weak suspiciousness signals across many agent transcripts in real time, escalating only rarely to a full LM-based review. In large-scale simulated datacenter traffic evaluations, the monitor Pareto-dominates standard monitors by catching distributed attacks 30% earlier with negligible latency overhead for ~99% of traffic. The system also incidentally catches standard jailbreaks, since adaptive attackers tend to reuse attack variants across accounts.

Evaluation and Benchmarking Inference Economics Real-Time Clustering Stateful Online Monitor Distributed Agent Attack +5 more

4arXiv · cs.CL·46h ago·source ↗

CATCH-ME dataset: multilingual multi-turn counterspeech against hate speech and misinformation for RAG systems

Researchers introduce CATCH-ME, a large-scale expert-curated multilingual dataset of multi-turn dialogues addressing the intersection of hate speech and misinformation across five languages and seven marginalized groups. The dataset is anchored in verified external knowledge (fact-checking articles and NGO reports) with document- and chunk-level span annotations, making it directly usable for RAG-based counterspeech systems. It addresses a gap in existing resources, which are limited to single-turn English dialogues, and is intended to improve the factual grounding and persuasiveness of LLM-generated counterspeech.

Evaluation and Benchmarking AI Safety Research CATCH-ME

6arXiv · cs.CL·1mo ago·source ↗

LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models

LASH is a black-box jailbreak framework that adaptively composes outputs from multiple existing attack families into hybrid prompts using a genetic optimizer with a two-stage fitness function. Evaluated on JailbreakBench across six target models, LASH achieves 84.5% attack success rate (keyword-based) and 74.5% (LLM-judge) with only 30 mean target queries, outperforming five state-of-the-art baselines. The work demonstrates that no single jailbreak family dominates across models and harm categories, and that adaptive cross-strategy composition is a promising red-teaming direction. Results hold under three defense mechanisms.

Evaluation and Benchmarking AI Safety Research LASH black-box jailbreaking JailbreakBench +3 more

5Hacker News·16d ago·source ↗

Practitioner spends $1,500 testing LLM offensive security capabilities against a purpose-built vulnerable app

A developer built a deliberately vulnerable application and ran LLMs against it as automated penetration testers, spending $1,500 on API costs across the experiment. The post evaluates how well current LLMs can identify and exploit real vulnerabilities in a controlled setting. Results provide practical signal on the current state of LLM-assisted offensive security, a capability area with both red-team and safety implications.

Evaluation and Benchmarking AI Safety Research Kasra