Entity · model

Qwen2.5-1.5B

modelactiveqwen2-5-1-5b-f74d5c16·3 events·first seen May 27, 2026

Aliases: Qwen2.5-1.5B

Co-occurring entities

Llama 3.2 Gemma 2 Actionable Activation Directions for Detecting and Mitigating Emergent Misalignment Across Language Model Families Ministral 3B difference-in-means Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation CICIDS LoRA Phi-4-mini MITRE ATT&CK UNSW-NB15 Atlas matched-control protocol retention curve SQuAD Retrieval-Augmented Generation Phi-2

More like this (12)

Qwen2.5-0.5B Qwen3.5-0.8B Qwen2.5-1.5B-Base Qwen2.5-8B Qwen-0.5B Qwen2.5-7B Qwen2.5-3B Qwen2.5-14B Qwen3-1.7B-Base Qwen3-1.7B Qwen3.5-2B-Base Qwen3.5-35B-A3B

Recent events (3)

6arXiv · cs.CL·Jun 19, 2026·source ↗

Activation-space directions for detecting and mitigating emergent misalignment across LLM families

Researchers fine-tuned four small instruction-tuned model families (Qwen2.5-1.5B, Gemma-2-2B, Llama-3.2-1B, Ministral-3B) on insecure code to induce emergent misalignment, then investigated whether a shared activation-space direction could detect and correct it. A difference-in-means direction achieves 99.6% separation of aligned vs. misaligned activations within each model, and causal steering by subtracting this direction reduces misaligned behavior by 21–51 points. Cross-architecture transfer via ridge regression yields large behavioral suppression but fails specificity controls, revealing a two-tier structure: within-model directions are causally specific and actionable, while cross-model directions are real but non-specific. The findings bound the utility of linear cross-architecture correction and recommend within-model probing for safety auditing.

Evaluation and Benchmarking AI Safety Research Llama 3.2 Gemma 2 Qwen2.5-1.5B +4 more

5arXiv · cs.LG·Jun 17, 2026·source ↗

Multi-source cybersecurity log dataset with ATT&CK labels and SLM fine-tuning evaluation

Researchers introduce a new multi-source cybersecurity log dataset of 870 sessions (~2.3M events) capturing system, network, and browser activity on Windows endpoints, with per-entry MITRE ATT&CK technique labels across 12 tactics and 53 techniques. The dataset addresses gaps in existing public datasets (CICIDS, UNSW-NB15, ATLAS) that lack combined multi-source coverage with fine-grained ATT&CK labeling. Three small language models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) were fine-tuned with LoRA on the dataset, achieving chunk classification accuracy of 90–97% versus ~8% for base variants, though ATT&CK technique identification remained harder at 42% exact-match accuracy.

Evaluation and Benchmarking AI Safety Research Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation CICIDS Llama 3.2 +6 more

5arXiv · cs.CL·May 27, 2026·source ↗

Separating Semantic Competition from Context Length in RAG Reading

This paper introduces a matched-control protocol to isolate whether RAG reader failures stem from context length or semantic competition among retrieved passages. By replacing hard-competitor passages with less competitive ones while holding passage count and length fixed, the authors demonstrate a measurable competition effect on SQuAD using Phi-2 and Qwen2.5-1.5B. Phi-2 recovers +6.0 EM and +7.0 answer-inclusion points; Qwen2.5-1.5B recovers +4.5 EM and +9.0 answer-inclusion points. The study also introduces retention curves and a right-censored half-life metric to track performance degradation as competitors accumulate.

Evaluation and Benchmarking Agent and Tool Ecosystem matched-control protocol retention curve Qwen2.5-1.5B +3 more