Llama3-8B-Instruct
llama3-8b-instruct-4fe1960c·4 events·first seen 1mo agoAliases: Llama3-8B-Instruct, Llama-3-8B-Instruct, Llama-3.1-8B-Instruct
Co-occurring entities
More like this (12)
Recent events (4)
General Preference Reinforcement Learning (GPRL): Bridging Online RL and Preference Optimization for Open-Ended Tasks
GPRL proposes a new alignment framework that replaces scalar reward models with a General Preference Model (GPM) embedding responses into k skew-symmetric subspaces to capture multi-dimensional, intransitivity-aware preferences. The method computes per-dimension group-relative advantages, normalizes across axes, and uses a closed-loop drift monitor to detect and correct single-axis reward hacking during training. Starting from Llama-3-8B-Instruct, GPRL achieves a 56.51% length-controlled win rate on AlpacaEval 2.0 and outperforms SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench. The work directly addresses the gap between verifiable-reward online RL (strong on math/code) and preference optimization (strong on open-ended tasks).
Gravity-Weighted DPO enforces multi-level instruction hierarchies in LLMs
Researchers introduce Gravity-Weighted DPO (GW-DPO), a preference-optimization objective that scales per-sample loss offsets by the structural distance between conflicting instruction levels, addressing the problem of uniform architectural privilege across trust levels in production LLMs. The work formalizes a 5-level instruction hierarchy with ten pairwise priority relations and combines GW-DPO with hierarchy-specific delimiter tokens and Instructional Segment Embeddings (ISE). Evaluated on Llama-3.1-8B-Instruct, the bilateral GW-DPO schedule Pareto-improves over standard DPO on macro pairwise priority adherence while cutting over-refusal rates in half. The approach directly targets prompt injection vulnerabilities arising from models' inability to resolve competing instructions by privilege level.
Dual-adapter routing system improves knowledge editing precision in LLMs
A new arXiv paper introduces a route-specialized dual-adapter architecture for knowledge editing in LLMs, separating the concerns of writing edits (edit adapter) and suppressing them when irrelevant (locality adapter). A relevance router gates which adapter is applied, addressing the locality problem in memory-assisted editing. Evaluated on CounterFact, zsRE, and MQuAKE benchmarks using Llama-3.1-8B-Instruct and Qwen3-8B, the method achieves best-in-class probability-preference accuracy across all three datasets. Ablations show the gain comes from the architectural separation rather than increased parameter capacity.
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Researchers from BAIR propose two fine-tuning-based defenses against prompt injection attacks: StruQ (Structured Instruction Tuning) and SecAlign (Special Preference Optimization). Both methods use a Secure Front-End with special delimiter tokens to separate trusted prompts from untrusted data, then fine-tune LLMs to ignore injected instructions. SecAlign, which uses DPO-style preference optimization, reduces attack success rates to under 15% against strong optimization-based attacks—more than 4x better than prior SOTA—while preserving model utility on AlpacaEval2.