Entity · model

Llama3-8B-Instruct

modelactivellama3-8b-instruct-4fe1960c·6 events·first seen May 18, 2026

Aliases: Llama3-8B-Instruct, Llama-3-8B-Instruct, Llama-3.1-8B-Instruct

Co-occurring entities

More like this (12)

Llama-3.2-1B-Instruct Llama-3.1-8B Llama3-8B Llama 3.3 70B Instruct LLaMA-2-7B-32K-Instruct Llama 3.2 Llama-Krikri-8B Llama 3.1 70B Llama-3 Llama 3 Llama 3.2 90B Vision-Instruct Qwen2.5-7B-Instruct-1M

Recent events (6)

4arXiv · cs.CL·2d ago·source ↗

Linear readouts of LLM hidden states decode causal reasoning about diagnostic evidence

Researchers introduce a paired-prompt benchmark testing whether language models can correctly match diagnostic evidence to causal claims that vary by population, estimand, or identifying assumptions — a task where surface-level cues can mislead. Using linear probes on final-token hidden states from Qwen2.5-7B, Qwen3-8B, and Llama-3.1-8B, they find balanced accuracy of 0.654–0.659 on a 49-pair benchmark spanning nine diagnostic families, exceeding permutation nulls and text-only baselines. The key finding is that hidden states contain linearly decodable information about causal relevance that is not fully captured by output logits or surface features.

Evaluation and Benchmarking Qwen2.5-7B-Instruct-1M Llama3-8B-Instruct Same Evidence, Different Target: Decoding How Diagnostic Evidence Bears on Causal Questions from Language-Model States +1 more

5arXiv · cs.CL·Jul 13, 2026·source ↗

Self-Guided Test-Time Training improves long-context LLM utilization by up to 15%

Researchers propose Self-Guided Test-Time Training (S-TTT), a method that addresses the degradation in accuracy LLMs exhibit on long inputs by having the model first identify relevant evidence spans before applying test-time training only to those spans. A preliminary study on LongBench-v2 shows that TTT on randomly sampled spans hurts performance while TTT on oracle spans substantially helps, motivating the self-guided selection approach. Evaluated on LongBench-v2 and LongBench-Pro with Qwen3-4B-Thinking-2507 and Llama-3.1-8B-Instruct, S-TTT achieves up to 15% relative accuracy improvement. The method offers a practical path to better long-context utilization without the prohibitive cost of full-context adaptation.

Long Context Evolution Evaluation and Benchmarking Qwen3-4B-Thinking-2507 Self-Guided Test-Time Training Llama3-8B-Instruct +2 more

4arXiv · cs.LG·Jun 15, 2026·source ↗

Dual-adapter routing system improves knowledge editing precision in LLMs

A new arXiv paper introduces a route-specialized dual-adapter architecture for knowledge editing in LLMs, separating the concerns of writing edits (edit adapter) and suppressing them when irrelevant (locality adapter). A relevance router gates which adapter is applied, addressing the locality problem in memory-assisted editing. Evaluated on CounterFact, zsRE, and MQuAKE benchmarks using Llama-3.1-8B-Instruct and Qwen3-8B, the method achieves best-in-class probability-preference accuracy across all three datasets. Ablations show the gain comes from the architectural separation rather than increased parameter capacity.

Evaluation and Benchmarking Alignment and RLHF BGE Llama3-8B-Instruct Qwen3-4B +4 more

6arXiv · cs.CL·Jun 10, 2026·source ↗

Gravity-Weighted DPO enforces multi-level instruction hierarchies in LLMs

Researchers introduce Gravity-Weighted DPO (GW-DPO), a preference-optimization objective that scales per-sample loss offsets by the structural distance between conflicting instruction levels, addressing the problem of uniform architectural privilege across trust levels in production LLMs. The work formalizes a 5-level instruction hierarchy with ten pairwise priority relations and combines GW-DPO with hierarchy-specific delimiter tokens and Instructional Segment Embeddings (ISE). Evaluated on Llama-3.1-8B-Instruct, the bilateral GW-DPO schedule Pareto-improves over standard DPO on macro pairwise priority adherence while cutting over-refusal rates in half. The approach directly targets prompt injection vulnerabilities arising from models' inability to resolve competing instructions by privilege level.

AI Safety Research Agent and Tool Ecosystem Instructional Segment Embeddings Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization Llama3-8B-Instruct +3 more

7arXiv · cs.CL·May 19, 2026·source ↗

General Preference Reinforcement Learning (GPRL): Bridging Online RL and Preference Optimization for Open-Ended Tasks

GPRL proposes a new alignment framework that replaces scalar reward models with a General Preference Model (GPM) embedding responses into k skew-symmetric subspaces to capture multi-dimensional, intransitivity-aware preferences. The method computes per-dimension group-relative advantages, normalizes across axes, and uses a closed-loop drift monitor to detect and correct single-axis reward hacking during training. Starting from Llama-3-8B-Instruct, GPRL achieves a 56.51% length-controlled win rate on AlpacaEval 2.0 and outperforms SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench. The work directly addresses the gap between verifiable-reward online RL (strong on math/code) and preference optimization (strong on open-ended tasks).

Frontier Model Releases Evaluation and Benchmarking WildBench MT-Bench General Preference Reinforcement Learning +7 more

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Researchers from BAIR propose two fine-tuning-based defenses against prompt injection attacks: StruQ (Structured Instruction Tuning) and SecAlign (Special Preference Optimization). Both methods use a Secure Front-End with special delimiter tokens to separate trusted prompts from untrusted data, then fine-tune LLMs to ignore injected instructions. SecAlign, which uses DPO-style preference optimization, reduces attack success rates to under 15% against strong optimization-based attacks—more than 4x better than prior SOTA—while preserving model utility on AlpacaEval2.

AI Safety Research Agent and Tool Ecosystem StruQ SecAlign Berkeley AI Research (BAIR)+7 more