Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering
The paper introduces the Gumbel Machine, a modular framework for generating counterfactual text that improves student writing while preserving similarity to the original. Central to the approach is β-Hindsight control, a controlled decoding algorithm that uses Gumbel noise as a tunable similarity mechanism during LLM generation. Experiments on student writing datasets show the method produces outputs that are both rubric-consistent and close to the reference text. The approach is positioned as more flexible and practically applicable than prior domain-specific counterfactual generation methods.
Related guides (1)
Related events (8)
QUIET: Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation
QUIET (Quality Understanding via Interlocked Evaluation Testing) is a new benchmark designed to evaluate LLM creative generation capability rather than discriminative recognition, addressing limitations of benchmarks like Story Cloze Test and HellaSwag. The benchmark places 10-20 blanks with explicit content constraints and cascade dependencies into complete stories, requiring open-ended generation rather than multiple-choice selection. Scoring uses an information-theoretic automated protocol operationalizing a 'calibrated surprise' framework: score = satisfy * (1 + lambda * surprise), combining constraint satisfaction with a surprise measure, enabling objective automated evaluation without human graders or LLM-as-Judge subjectivity.
Assisted Generation: a new direction toward low-latency text generation
Hugging Face introduces assisted generation (speculative decoding) as a practical technique for reducing LLM inference latency. The approach uses a smaller draft model to propose token candidates that a larger model then verifies in parallel, enabling multiple tokens to be accepted per forward pass. The blog post explains the mechanism and demonstrates integration into the Hugging Face Transformers library.
Generating Human-level Text with Contrastive Search in Transformers
Hugging Face introduces contrastive search, a decoding strategy for autoregressive language models that aims to produce more coherent and human-like text compared to standard methods like beam search or nucleus sampling. The technique works by balancing a model's confidence in its next-token prediction against a contrastive penalty that discourages repetitive or degenerate outputs. The blog post describes integration of contrastive search into the Hugging Face Transformers library, making it accessible to practitioners.
Improving Prompt Consistency with Structured Generations
This Hugging Face blog post examines how structured generation outputs can improve consistency in LLM evaluation pipelines. It explores techniques for constraining model outputs to specific formats, reducing variability in prompt-based assessments. The post addresses a practical challenge in evaluation workflows where inconsistent response formats degrade measurement reliability.
Finding GPT-4's Mistakes with GPT-4: CriticGPT
OpenAI has developed CriticGPT, a GPT-4-based model trained to write critiques of ChatGPT outputs, helping human trainers identify errors during RLHF. The system is designed to address a core scalable oversight challenge: human raters often miss subtle mistakes in long or complex model outputs. CriticGPT-assisted trainers outperformed unassisted trainers in catching model errors, suggesting a path toward more reliable RLHF pipelines.
Gamified writing experiment studies when humans adopt AI suggestions vs. maintain creative autonomy
A preprint from arXiv introduces 'Nonslop,' a gamified writing experiment with 74 participants designed to study authentic human preferences in AI-assisted creative writing. The system deliberately inverts the helpful-assistant pattern by disincentivizing AI suggestion acceptance, simulating a dystopian framing to reveal genuine user behavior rather than default compliance. The study analyzes when users choose creative autonomy versus accepting AI assistance across different task types and response characteristics. Findings bear on questions of individual voice, authenticity, and the tension between efficiency and human expression in LLM-augmented writing.
Counterfactual context revision framework for auditing LLM-based stance simulation in online discussions
Researchers introduce a counterfactual context revision framework to audit how LLMs simulate individual users' stances in online discussions. By applying controlled text-only and multimodal (meme-based) revisions to conversational contexts, they measure how readily simulated stances shift in response to semantically independent changes. Results show effective and robust stance transitions across both revision types and polarization-preference mechanisms, raising concerns about whether LLM simulations reflect genuine user-specific beliefs or are highly context-sensitive artifacts. The work contributes an evaluation framework and highlights risks of using LLMs to model online opinion dynamics.
LLUMI: Fine-Tuning Open-Source LLMs for Mental Health Writing Assistance Using Reddit Community Feedback
LLUMI is a two-component system (a generation model and an improvement model) designed to provide mental health writing assistance using smaller open-source LLMs hosted in privacy-preserving, on-premise environments. The system leverages Reddit community endorsement signals (upvotes/downvotes) to construct preference pairs for SFT and DPO training, then further aligns outputs via human evaluation across readability, empathy, connection, actionability, and safety dimensions. Results show LLUMI achieves performance comparable to proprietary GPT-based models on linguistic and human evaluations, suggesting community-derived preference signals can substitute for expensive expert labeling in sensitive domains.
