Almanac
benchmark

LongBench-Write

benchmarkactiveprovisionallongbench-write-27bf8a37·1 events·first seen 8d ago

Aliases: LongBench-Write

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·8d ago·source ↗

IS-CoT framework addresses long-form generation collapse in LLMs via interleaved structural thinking

Researchers introduce IS-CoT (Interleaved Structural Chain-of-Thought), a framework that embeds a dynamic Plan-Write-Reflect cycle into LLM generation to overcome severe length collapse observed in reasoning-enhanced models for open-ended writing tasks beyond 2,000 words. The authors construct a multi-teacher training dataset of interleaved reasoning traces and train IS-Writer-8B, which achieves state-of-the-art results on LongBench-Write, outperforming DeepSeek-V3.2 by 3.08 points. The work identifies static hierarchical planning as a root cause of long-form degradation and proposes an in-model alternative to external agentic workflows.