Entity · organization

EIT-NLP

organizationactiveeit-nlp-4a39b273·4 events·first seen Jun 15, 2026

Aliases: EIT-NLP

Co-occurring entities

WIDE Post-Training Shifts Confidence: A Three-Stage Analysis of How SFT, RL, and OPD Shape Pre-, Intra-, and Post-CoT Calibration PosConf Unified Latent Probe What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis Hierarchical Relative Policy Optimization AdaSR

More like this (12)

EMNLP 2025 Yale NLP Natural Language Processing Tianjin University NLP Lab LLM-augmented clinical NLP pipeline OSU NLP Group Zhejiang University NLP Lab Clinical NLP E-TTS Zhejiang University NLP Group (ZJUNLP)NeoBERT MedNLI

Recent events (4)

5arXiv · cs.CL·33h ago·source ↗

WIDE: Token-level dynamic width pruning framework for efficient LLM inference

WIDE is a new end-to-end differentiable framework for token-level dynamic width pruning of LLMs, enabling each token to independently select attention-head groups and FFN-channel groups at inference time. The system introduces a pruning-kernel co-design that decomposes dynamic sparsity acceleration into mask reordering and block-level skipping, achieving near-theoretical speedups of up to 1.98x for prefill and 4.95x for decoding at 50% sparsity. At that sparsity level, WIDE reports a 55.1% performance improvement over state-of-the-art dynamic depth pruning under calibration-only settings. Code is publicly released.

Training Infrastructure Inference Economics EIT-NLP WIDE

5arXiv · cs.CL·Jul 16, 2026·source ↗

PosConf: Position-aware confidence calibration across SFT, RL, and OPD post-training stages

A new arXiv paper introduces a three-stage calibration framework analyzing how supervised fine-tuning (SFT), reinforcement learning (RL), and on-policy distillation (OPD) shape model confidence before, during, and after chain-of-thought reasoning. The authors find that each post-training method produces distinct calibration profiles at different reasoning stages, and that RL confidence becomes informative only after a path-commitment phase while OPD confidence degrades later. They propose PosConf, a position-aware confidence strategy that selectively uses confidence from reliable relative-position intervals, improving RL answer aggregation by 6.1 points over majority voting and OPD early stopping by up to 4.3 points.

Evaluation and Benchmarking Alignment and RLHF EIT-NLP Post-Training Shifts Confidence: A Three-Stage Analysis of How SFT, RL, and OPD Shape Pre-, Intra-, and Post-CoT Calibration PosConf

5arXiv · cs.CL·Jun 19, 2026·source ↗

Information-theoretic analysis of supervision in latent chain-of-thought reasoning

This paper analyzes Latent Chain-of-Thought (CoT) reasoning — where reasoning occurs in continuous hidden states rather than discrete text — through an information-theoretic lens, identifying a 'dual collapse' failure mode involving gradient attenuation and representational drift. The authors decompose process supervision into Trajectory Supervision and Space Supervision, and introduce the Unified Latent Probe (ULP) to quantify mutual information between latent trajectories and explicit reasoning steps. Experiments reveal an 'Information-Performance Binding' showing reasoning accuracy depends on information fidelity in the latent chain, suggesting supervision should shift from geometric imitation toward mutual information maximization.

Evaluation and Benchmarking Alignment and RLHF EIT-NLP Unified Latent Probe What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

5arXiv · cs.CL·Jun 15, 2026·source ↗

AdaSR: Adaptive streaming reasoning framework with Hierarchical Relative Policy Optimization

Researchers introduce AdaSR, a framework enabling large reasoning models to reason incrementally during streaming input (e.g., audio/video) rather than waiting for complete context, then perform final deliberation once the stream ends. The core contribution is Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming and deep reasoning phases with fine-grained per-phase advantage assignment, integrating format, accuracy, and latency-aware rewards. Experiments show AdaSR improves the tradeoff among reasoning accuracy, computational efficiency, and streaming latency over supervised fine-tuning baselines. Code is publicly released.

Inference Economics Agent and Tool Ecosystem Hierarchical Relative Policy Optimization EIT-NLP AdaSR +1 more