paper

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

paperactiveprovisionalcognitive-episodes-in-llm-reasoning-traces-enable-interpretable-human-item-difficulty-prediction-4f4647f3·1 events·first seen 21h ago

Aliases: Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Co-occurring entities

Epi2Diff

More like this (12)

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning Long-context Reasoning Benchmarks Towards Root Memories: Benchmarking and Enhancing Implicit Logical Memory Retrieval for Personalized LLMs Watch, Remember, Reason: Human-View Video Understanding with MLLMs Reasoning Language Models Multilingual Reasoning Cascades Need More Context When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning Predicting Future Behaviors in Reasoning Models Enables Better Steering

Recent events (1)

4arXiv · cs.AI·21h ago·source ↗

Epi2Diff framework uses LLM reasoning traces to predict human item difficulty in educational assessment

Researchers introduce Epi2Diff (Episode to Difficulty), a framework that parses Large Reasoning Model (LRM) reasoning traces into structured cognitive episode sequences to predict how difficult test items are for humans. The approach extracts features from reasoning dynamics—effort allocation, state transitions, iteration patterns—and combines them with semantic item representations. Experiments on four real-world difficulty datasets, including SAT-derived benchmarks, show an 8.1% average relative gain over supervised LLM fine-tuning baselines. The work provides interpretable process evidence for educational measurement without requiring costly human calibration.

Evaluation and Benchmarking Epi2Diff Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction