Entity · technique

Natural Language Inference

techniqueactivenatural-language-inference-446a8e30·2 events·first seen May 26, 2026

Aliases: Natural Language Inference

Co-occurring entities

Cross-Annotator Preference Optimization (CAPO)Human Label Variation (HLV)Paraphrase Judgment supervised fine-tuning Qwen2.5-7B GRPO reward hacking Qwen3-4B BERTScore signal collapse MedNLI Retrieval-Augmented Generation Llama-3.1-8B

More like this (12)

Natural Language Processing Text Generation Inference Reasoning Language Models Natural Language Autoencoders Words as Difference Makers: How Large Language Models Determine Causal Structure in Text Conditional Inference Privacy Inference Attack Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?From Data to Device: ELMOD An Efficient German-First 2.7B Language Model for Mobile Inference Native Active Perception as Reasoning for Omni-Modal Understanding Knowledge-Less Language Models Revisiting the Systematicity in Negation in the Era of In-Context Learning

Recent events (2)

5arXiv · cs.CL·May 28, 2026·source ↗

Cross-Annotator Preference Optimization (CAPO) for Learning Annotator-Specific Explanation Behavior

This paper investigates whether LLMs can learn and reproduce individual annotator-specific reasoning patterns, not just label choices, using two sentence-pair tasks (NLI and paraphrase judgment) with four annotators each. The authors find that annotator-specific patterns are weak at the single-annotation level but detectable after aggregation, and propose CAPO—a preference optimization method that contrasts a target annotator's response against other valid but less target-specific annotations. CAPO outperforms prompting and supervised fine-tuning baselines in capturing annotator-specific label-explanation behavior. The work suggests a path toward scalable annotation pipelines grounded in annotator histories rather than labels alone.

Evaluation and Benchmarking Alignment and RLHF Cross-Annotator Preference Optimization (CAPO)Human Label Variation (HLV)Natural Language Inference +2 more

6arXiv · cs.CL·May 26, 2026·source ↗

Signal Collapse and Reward Hacking in Checker-Guided RAG for Biomedical QA

This paper investigates why NLI-based claim checkers used as process rewards in RL-trained medical RAG agents succeed or fail during training. The authors find that a checker's output distribution during training—not its held-out accuracy—determines whether it provides useful gradient signal, with LLM log-probability scoring causing near-total signal collapse (97%+ neutral labels) while a calibrated MedNLI classifier avoids this. A key finding is that stronger checkers can trigger reward hacking cascades (ultra-short answers, search avoidance, language collapse), while moderate-signal local classifiers yield better final model quality (+12% BERTScore over zero-shot). The work frames these as boundary conditions for verifier-as-reward systems in RLVR pipelines.

Evaluation and Benchmarking Agent and Tool Ecosystem Qwen2.5-7B GRPO reward hacking +8 more