4arXiv cs.CL (Computation and Language)·11d ago

Corpus-Grounded Feature Diffusion pipeline for automated IEP generation in Traditional Chinese

Researchers propose a low-resource fine-tuning pipeline called Corpus-Grounded Feature Diffusion (CGFD) to automate Individualized Education Program (IEP) drafting from Traditional Chinese parent-teacher interview transcripts. The approach fine-tunes Breeze-7B with QLoRA on 582 synthetically diffused samples and uses schema-constrained decoding at inference time, finding that Grammar-Constrained Decoding is counterproductive under Traditional Chinese token budgets. On a small formal hold-out (n=10), the system achieves BERTScore F1 of 0.779, outperforming zero-shot GPT-5.4, DeepSeek-V3.2, Gemini-3-Flash-Preview, and Llama-4-Maverick baselines while enabling fully local, air-gapped inference. The work addresses a gap in Traditional Chinese special-education NLP and demonstrates a privacy-preserving deployment pattern for sensitive document generation.

Evaluation and Benchmarking Enterprise Deployment Patterns DeepSeek V4 Corpus-Grounded Feature Diffusion Grammar-Constrained Decoding BERTScore Gemini 3.1 Flash Live Preview QLoRA Llama-4-Maverick Breeze-7B GPT-5.5

Related guides (4)

GPT-5.5

GPT-5.5: OpenAI's Most Capable Model — and Its Most Complicated

Read asBeginner In-depth

DeepSeek V4

DeepSeek V4: The Open-Weights Giant Reshaping AI Economics

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From LLM Demo to Production Reality

Read asIn-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: The Shifting Yardstick of AI Capability

Read asIn-depth

Related events (8)

6arXiv · cs.CL·3d ago·source ↗

ZPPO: Teacher-in-prompt training method outperforms distillation and GRPO for small vision-language models

Researchers introduce Zone of Proximal Policy Optimization (ZPPO), a training method inspired by Vygotsky's zone of proximal development that embeds teacher guidance in prompts rather than policy gradients or logit imitation. On hard questions where student rollouts fail, ZPPO constructs Binary Candidate-included Questions (BCQ) and Negative Candidate-included Questions (NCQ) to help the student discriminate correct from incorrect responses, with a replay buffer that recirculates hard questions until mastered. Evaluated on the Qwen3 family (0.8B–9B) with a 27B teacher across a 31-benchmark suite covering VLM, LLM, and video tasks, ZPPO outperforms both distillation and GRPO baselines, with the largest gains at the smallest model scale. The method addresses a known failure mode of RL training where zero-reward rollouts produce no gradient signal.

Open Weights Progress Alignment and RLHF GRPO Proximal Policy Optimization Qwen3 +1 more

4arXiv · cs.CL·17d ago·source ↗

Synthetic linguistic reasoning traces improve low-resource machine translation via in-context learning

Researchers propose a pipeline that generates step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks to assist LLMs in translating extremely low-resource languages. Evaluated on Xibe and Chintang across ICL, SFT, and RFT settings, the traces prove most effective as inference-time guidance rather than training data. Models can leverage reliable grammatical analyses at inference time but struggle to learn to generate accurate traces themselves, identifying trace generation quality as the key bottleneck.

Evaluation and Benchmarking Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?Universal Dependencies

4Qwen Research·1mo ago·source ↗

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

Alibaba's Qwen team released Chinese CLIP, a language-specific vision-language contrastive pretraining model targeting Chinese multimodal representation learning. The project addresses a gap in open-source Chinese CLIP models, particularly for cross-modal retrieval tasks. It follows the CLIP framework but is adapted for Chinese language and cultural context.

Open Weights Progress Multimodal Progress contrastive vision-language pretraining Chinese CLIP CLIP +1 more

5arXiv · cs.CL·24d ago·source ↗

DIVE: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation

DIVE is a frozen-backbone distillation framework that addresses a fundamental limitation in token-level in-context vector distillation: uniform cross-entropy supervision treats all output tokens equally, but long-form outputs like medical reports are dominated by low-information template tokens while diagnostically critical tokens receive insufficient gradient signal. The method introduces decisive-token supervision (upweighting pathology-related tokens and EOS events) and state-conditioned dynamic steering (hidden-state-dependent adapters replacing fixed residuals) to correct supervision imbalance and autoregressive drift. Evaluated on MIMIC-CXR and CheXpert Plus with two medical VLM backbones, DIVE achieves best BLEU-4, ROUGE-L, and RadGraph F1 across all dataset-backbone combinations while remaining competitive on CheXbert F1.

Inference Economics Multimodal Progress State-Conditioned Dynamic Steering RadGraph F1 CheXbert F1 +5 more

4arXiv · cs.CL·3d ago·source ↗

Cross-lingual in-context learning source language selection challenges fine-tuning assumptions

A new arXiv paper conducts a broad empirical study of cross-lingual transfer in few-shot in-context learning (ICL), spanning seven tasks, six models, and a typologically diverse set of languages. The study finds that conventional heuristics from supervised fine-tuning — such as relying on linguistic similarity or data availability — do not consistently transfer to the ICL regime. The authors also analyze language confusion as a key obstacle in generative cross-lingual ICL and propose alternative heuristics for source language selection.

Evaluation and Benchmarking When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

5arXiv · cs.CL·9d ago·source ↗

AGDO: Attention-guided denoising and optimization framework improves diffusion language model reasoning

Researchers propose AGDO, a framework that replaces random masking in diffusion large language models (dLLMs) with attention-guided denoising order and token weighting during fine-tuning and reinforcement learning. The work is motivated by an empirical finding that tokens with stronger attention to unmasked context are more stable and critical for reasoning. Experiments on math and coding benchmarks show AGDO outperforms existing post-training methods for dLLMs, advancing the case for attention-aware training in parallel-decoding language models.

Alignment and RLHF AGDO Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models

4arXiv · cs.CL·25d ago·source ↗

Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English Speech

This paper presents the first NLP-based dementia detection study for Filipino speech, constructing a parallel bilingual dataset of 4,000 DementiaBank-derived transcripts with manual Filipino translations. Five model families are evaluated across monolingual, zero-shot cross-lingual, and bilingual fine-tuning settings. English-trained BERT degrades sharply on Filipino (Macro-F1 = 0.455), but bilingual fine-tuning recovers performance to Macro-F1 = 0.969–0.973 across all transformer models. The key finding is that multilingual clinical NLP performance is driven by linguistic coverage during training rather than model scale or architecture.

Evaluation and Benchmarking TF-IDF + Logistic Regression NeoBERT DementiaBank +4 more

5arXiv · cs.AI·46h ago·source ↗

FlowEdit: Lifelong pronunciation adaptation for flow-matching TTS via associative memory

FlowEdit is a new framework enabling lifelong pronunciation correction in frozen flow-matching text-to-speech systems without retraining model weights. Corrections are stored as token-level perturbations in text embedding space within a Modern Hopfield Network, retrieved at inference via soft attention with fuzzy morphological matching. On a curated benchmark of 312 multilingual proper nouns across 18 language families, the method reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline, with each correction completing in ~15 seconds on a single GPU.

Inference Economics Enterprise Deployment Patterns Modern Hopfield Network FlowEdit FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS