Phi-4-mini
phi-4-mini-3d9c04e6·2 events·first seen 26d agoAliases: Phi-4-mini
Co-occurring entities
More like this (12)
Recent events (2)
LamPO: Lambda-Style Policy Optimization with Pairwise Decomposed Advantage for Reasoning LMs
LamPO proposes a new RLVR training objective that replaces GRPO's scalar group-relative advantages with a Pairwise Decomposed Advantage, aggregating pairwise reward gaps within response groups and weighting comparisons by confidence-aware log-probability differences. The method retains a critic-free, clipped-update PPO-style structure and optionally adds a ROUGE-L-based dense auxiliary reward to reduce sparsity. Experiments on AIME24, AIME25, MATH-500, and GPQA-Diamond using Qwen3-1.7B, Qwen3-4B, and Phi-4-mini show consistent improvements over GRPO and other RLVR variants with more stable training dynamics.
Multi-source cybersecurity log dataset with ATT&CK labels and SLM fine-tuning evaluation
Researchers introduce a new multi-source cybersecurity log dataset of 870 sessions (~2.3M events) capturing system, network, and browser activity on Windows endpoints, with per-entry MITRE ATT&CK technique labels across 12 tactics and 53 techniques. The dataset addresses gaps in existing public datasets (CICIDS, UNSW-NB15, ATLAS) that lack combined multi-source coverage with fine-grained ATT&CK labeling. Three small language models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) were fine-tuned with LoRA on the dataset, achieving chunk classification accuracy of 90–97% versus ~8% for base variants, though ATT&CK technique identification remained harder at 42% exact-match accuracy.