4arXiv cs.CL (Computation and Language)·4d ago

Space-Efficient Language Generation in the Limit: Poly-Space Algorithms with Bounded Hallucination Gap

A new arXiv preprint introduces a resource-aware theory of language generation in the limit, studying learners that must produce hallucination-free hypothesis languages from adversarial positive streams under memory constraints. The authors focus on DFA-recognizable language classes and prove a streaming algorithm using poly(s,k) space that converges with a bounded generation gap, complemented by a near-matching lower bound via communication complexity reduction. The results reveal a sharp phase transition between polynomial-space generation and exponential-space exact identification, providing theoretical grounding for memory-bounded language generation.

Evaluation and Benchmarking AI Safety Research Space-Efficient Language Generation in the Limit

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.LG·1mo ago·source ↗

Language Generation in the Limit with Bounded Memory: Characterization via Sperner's Theorem

This paper studies language generation in the limit under bounded memory constraints, extending classical learning theory to the generation setting. The authors characterize when memoryless generation is possible, derive minimax density bounds using Sperner's theorem and symmetric chain decompositions, and show that adaptively chosen memory outperforms sliding-window memory. They also revisit incremental identification in the limit, finding that exact identification fails for collections of three or more languages but an approximate relaxation is achievable for all finite collections.

Evaluation and Benchmarking AI Safety Research Sperner's Theorem Language Generation in the Limit Identification in the Limit +2 more

6arXiv · cs.LG·7d ago·source ↗

PAC-Bayes analysis establishes formal expressivity and alignment floors for prompt-conditioned LLMs

A new arXiv preprint models user-LLM interaction as a bilevel cheap-talk game and derives PAC-Bayes bounds showing two irreducible limitations: an 'expressivity floor' where language's finite channel capacity makes distinct tasks indistinguishable, and an 'objective-misalignment floor' where alignment constraints prevent reaching user-ideal outputs. The authors prove that prompt-conditioned LLMs cannot be universal problem solvers, as correct behavior on certain task families is provably unattainable even with infinite data, optimal training, or model scaling. The work suggests multimodal inputs and external memory as potential mitigations by increasing task-relevant information bandwidth.

Evaluation and Benchmarking Alignment and RLHF PAC-Bayes On the Limits of Prompt-Conditioned Language Models as General-Purpose Learners

5arXiv · cs.CL·6d ago·source ↗

FMLM+ introduces Posterior Refinement for fast non-autoregressive language generation

Researchers introduce FMLM+, a framework combining Flow Map Language Models with masking-style noise schedules to enable joint sequence generation with per-token global consistency scoring. The key contribution is Posterior Refinement, an inference-time self-correction strategy that matches discrete baseline performance with 32x fewer neural function evaluations (NFEs). The approach improves the speed-quality tradeoff over both Masked Diffusion Models and standard FLMMs across multiple benchmarks, addressing longstanding factorization error problems in non-autoregressive generation.

Frontier Model Releases Inference Economics Posterior Refinement Flow Map Language Models FMLM++2 more

4arXiv · cs.CL·20h ago·source ↗

Scaling limit theory of the Random Language Model reveals condensation transition and language statistics

A new arXiv preprint develops a quantitative theory of the Random Language Model (RLM), an ensemble of stochastic context-free grammars, in a scaling limit where grammar size and temperature are jointly tuned. The authors identify a condensation phase transition at a critical parameter value and derive explicit scaling laws for entropy, rule diversity, and related observables across distinct regimes. The work claims to resolve prior ambiguities about thermodynamic transitions in language models and offers a unified framework connecting generative grammar statistics to universal properties of natural language and LLM behavior.

Evaluation and Benchmarking Random Energy Model Random Language Model Scaling limit of the Random Language Model

5Hugging Face Blog·1mo ago·source ↗

Assisted Generation: a new direction toward low-latency text generation

Hugging Face introduces assisted generation (speculative decoding) as a practical technique for reducing LLM inference latency. The approach uses a smaller draft model to propose token candidates that a larger model then verifies in parallel, enabling multiple tokens to be accepted per forward pass. The blog post explains the mechanism and demonstrates integration into the Hugging Face Transformers library.

Inference Economics Agent and Tool Ecosystem speculative decoding Assisted Generation Hugging Face Transformers +1 more

5Openai Blog·1mo ago·source ↗

Why Language Models Hallucinate

OpenAI published research explaining the mechanisms behind language model hallucination. The work connects improved evaluation methods to enhanced AI reliability, honesty, and safety. The body is sparse on technical detail, but the framing positions this as foundational research relevant to alignment and deployment trust.

Evaluation and Benchmarking AI Safety Research hallucination (LLM)OpenAI +1 more

4arXiv · cs.CL·6d ago·source ↗

Study finds lower bitrate discrete speech representations sufficient for generative spoken language modeling

Researchers investigate how segmentation width and cluster size affect speech resynthesis and continuation quality in Generative Spoken Language Models (GSLM), which train language models on discrete speech units without text. They find that intelligible, natural speech can be synthesized at lower bitrates than the standard baseline, and that continuation quality remains stable at reduced bitrates, suggesting conventional GSLM settings may be over-specified. The paper also notes that LLM-based evaluation metrics correlate better with human judgments than conventional metrics, but correlation remains low, pointing to a gap in automatic evaluation for speech generation.

Evaluation and Benchmarking Multimodal Progress generative language modeling On the Effect of Segmentation Width and Cluster Size on Speech Resynthesis and Continuation in Generative Spoken Language Models K-means

6arXiv · cs.CL·28d ago·source ↗

Trajectory Analysis of Masked Diffusion LMs for Graph-to-Text Generation with Lambda-Scaled Structural Decoding

This paper presents the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation, analyzing the order in which tokens are unmasked during iterative decoding. The authors find MDLMs naturally unmask entities first, then relational/function words, then structural tokens—a pattern disrupted by supervised fine-tuning, which prematurely anchors structural tokens and causes hallucination or omission. They propose lambda-scaled structural decoding, a training-free inference-time fix that recovers +9.4 BLEU-4, and introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process. Cross-dataset evaluation on the LAGRANGE benchmark shows prior baselines overfit to dataset-specific patterns while MDLM-based approaches generalize better.

Frontier Model Releases Evaluation and Benchmarking BLEU-4 Graph Transformer Diffusion Language Models +5 more