4arXiv cs.CL (Computation and Language)·41h ago

Debiased One-Pass Attention Sorting fails to close gap with iterative sorting for long-context LLMs

A new arXiv preprint investigates whether position bias is the primary bottleneck in long-context LLM performance, proposing Debiased One-Pass Attention Sorting as a cheaper alternative to iterative Attention Sorting. Experiments on LLaMA-2-7B-32K-Instruct and YaRN-Llama-2-7b-64k show that bias correction alone is insufficient: on one model it provides no improvement over uncalibrated single-pass sorting, and on the other it closes only 37% of the gap to iterative sorting. The findings suggest that iterative reordering provides benefits beyond position-bias correction, leaving the efficiency-accuracy tradeoff unresolved.

Long Context Evolution Evaluation and Benchmarking Position Bias Correction is Insufficient for One-Pass Attention Sorting Debiased One-Pass Attention Sorting Attention Sorting YaRN-Llama-2-7b-64k LLaMA-2-7B-32K-Instruct

Related guides (2)

Long Context EvolutionTopic guide

Long Context Evolution: From Bigger Windows to Smarter Memory

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

AI Evaluation and Benchmarking: From Leaderboards to the Limits of Measurement

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·41h ago·source ↗

NLL-guided training-free method selects optimal full-attention layers for efficient long-context inference

Researchers propose NLL-guided layer selection, a training-free technique for hybrid attention models that identifies which layers should use full versus sliding-window attention by measuring negative log-likelihood degradation on answer tokens. On LongMemEval with Qwen3-4B, the method achieves 64.6% accuracy using only 1/4 full-attention layers, matching a 1/2-FA periodic baseline while halving compute, and outperforming a periodic 1/4-FA baseline by 10.4 percentage points. The calibration procedure requires approximately 15 minutes of one-time compute, making it practical for deployment. The work advances the efficiency-accuracy tradeoff for long-context LLM inference without requiring any retraining.

Long Context Evolution Inference Economics LongMemEval Qwen3-4B NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation +1 more

6arXiv · cs.AI·28d ago·source ↗

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

This paper identifies and analyzes 'Perceptual Judgment Bias' in multimodal LLM judges, where models anchor on response text rather than visual evidence when the two conflict. The authors introduce a Perceptually Perturbed Judgment Dataset using counterfactual responses to isolate perceptual errors, and a training framework combining GRPO-based reward modeling with batch-ranking objectives. Experiments on MLLM-as-a-Judge benchmarks show improved perceptual fidelity, ranking coherence, and alignment with human evaluation.

Evaluation and Benchmarking Alignment and RLHF Perceptually Perturbed Judgment Dataset Multimodal Large Language Models GRPO +3 more

5arXiv · cs.AI·22d ago·source ↗

Benchmarking study finds LLMs fail at counterintuitive probability problems despite strong standard performance

A new arXiv paper evaluates 8 state-of-the-art LLMs on discrete probability problems using two datasets: standard exercises (average accuracy 0.96) and counterintuitive exercises designed to trigger heuristic reasoning (average accuracy 0.59). The authors document token bias causing 20%+ performance drops when canonical problem formulations are disguised, and up to 34% degradation when misleading suggestions are embedded in prompts. The findings argue that current LLMs are not genuine probabilistic reasoners despite their success on advanced math benchmarks.

Evaluation and Benchmarking AI Safety Research How reliable are LLMs when it comes to playing dice?

7arXiv · cs.CL·21d ago·source ↗

One-shot GRPO training on a single biased example can break LLM alignment

A new arXiv paper demonstrates that a single biased training example using Group Relative Policy Optimization (GRPO) is sufficient to induce systematic bias in aligned LLMs, with stereotype-driven reasoning generalizing across attributes, categories, and benchmarks. The authors find that model susceptibility varies based on the initial likelihood of producing biased outputs. The result exposes a critical vulnerability in post-training alignment: a minimal fine-tuning intervention can override safety guardrails.

AI Safety Research Alignment and RLHF It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO GRPO (Group Relative Policy Optimization)

6arXiv · cs.LG·7d ago·source ↗

PAC-Bayes analysis establishes formal expressivity and alignment floors for prompt-conditioned LLMs

A new arXiv preprint models user-LLM interaction as a bilevel cheap-talk game and derives PAC-Bayes bounds showing two irreducible limitations: an 'expressivity floor' where language's finite channel capacity makes distinct tasks indistinguishable, and an 'objective-misalignment floor' where alignment constraints prevent reaching user-ideal outputs. The authors prove that prompt-conditioned LLMs cannot be universal problem solvers, as correct behavior on certain task families is provably unattainable even with infinite data, optimal training, or model scaling. The work suggests multimodal inputs and external memory as potential mitigations by increasing task-relevant information bandwidth.

Evaluation and Benchmarking Alignment and RLHF PAC-Bayes On the Limits of Prompt-Conditioned Language Models as General-Purpose Learners

5arXiv · cs.CL·7d ago·source ↗

Randomized YaRN improves LLM length generalization for long-context reasoning

Researchers propose Randomized YaRN, a training method that combines YaRN-based positional extrapolation with randomized positional encodings and a length curriculum to improve LLM generalization to long contexts. Models trained on sequences under 8K tokens show consistent reasoning improvements on context lengths from 16K to 128K on BABILong and MRCR benchmarks. The key insight is that exposing models to out-of-distribution positional representations during short-context training enables better generalization at far longer inference-time lengths.

Long Context Evolution Evaluation and Benchmarking BABILong Multi-Round Coreference Resolution YaRN +1 more

6arXiv · cs.AI·1mo ago·source ↗

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention for Long-Context LLMs

DashAttention introduces a two-stage hierarchical sparse attention mechanism that replaces the fixed top-k block selection used in methods like NSA and InfLLMv2 with an adaptive α-entmax transformation, allowing a variable number of KV blocks to be selected per query. The approach keeps the full hierarchy differentiable by using the first-stage selection as a prior for second-stage softmax attention. Experiments show comparable accuracy to full attention at 75% sparsity with a better Pareto frontier than competing methods, and a Triton GPU implementation achieves meaningful speedup over FlashAttention-3 at inference time.

Training Infrastructure Long Context Evolution Triton InfLLMv2 FlashAttention-3 +4 more

5arXiv · cs.CL·25d ago·source ↗

LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation

A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.

Evaluation and Benchmarking AI Safety Research From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation Meta Llama-3.1-8B