4arXiv cs.CL (Computation and Language)·1mo ago

SymbolicLight V1: Spike-Gated Dual-Path Language Model with High Activation Sparsity

SymbolicLight V1 is a 194M-parameter spiking language model that combines binary Leaky Integrate-and-Fire spike dynamics with a continuous residual stream, replacing dense self-attention with a dual-path module using exponential-decay aggregation and spike-gated local attention. Trained from scratch on a 3B-token Chinese-English corpus, it achieves validation perplexity of 8.88–8.93 at over 89% per-element activation sparsity, trailing GPT-2 201M by 7.7% in PPL. Ablations indicate that temporal integration via LIF dynamics contributes more to performance than sparsity alone, and a 0.8B-parameter scale-up on 48.8B tokens demonstrates optimization stability. Current dense-hardware inference is slower than GPT-2; neuromorphic deployment is framed as a future opportunity.

Training Infrastructure Inference Economics GPT-2 Dual-Path SparseTCAM Spiking Neural Networks SymbolicLight V1 Leaky Integrate-and-Fire (LIF)

Related guides (2)

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·11d ago·source ↗

Predictor-gated bank-wise sparsity recipe for dense-to-sparse LLM upcycling from Qwen2.5-8B

A new arXiv preprint introduces a continual training recipe to convert dense LLMs into channel-sparse models without post-hoc pruning. Starting from a Qwen2.5-8B checkpoint, the method uses a low-rank predictor to gate FFN channel routing, achieving 4x sparsity in FFN intermediate activations via a bank-wise top-k rule at 32K context. The routing module is trained on the main language modeling path, making the resulting sparsity hardware-oriented rather than approximate. The authors also identify and patch a layer-local long-context failure mode on the RULER-CWE benchmark.

Training Infrastructure Inference Economics Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs SwiGLU RULER-CWE +1 more

5arXiv · cs.AI·11d ago·source ↗

CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup

Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.

Inference Economics Qwen2.5 Alibaba CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference +2 more

5Hugging Face Blog·1mo ago·source ↗

SmolVLM - Small Yet Mighty Vision Language Model

Hugging Face introduces SmolVLM, a compact vision-language model designed to deliver strong multimodal performance at small parameter counts. The model targets edge and resource-constrained deployment scenarios while maintaining competitive capabilities relative to its size. The announcement highlights efficiency improvements in both training and inference for small-scale VLMs.

Open Weights Progress Inference Economics SmolVLM Hugging Face +1 more

6arXiv · cs.AI·18d ago·source ↗

SimSD: Speculative Decoding Adapted for Diffusion Language Models

SimSD introduces a training-free speculative decoding algorithm for diffusion large language models (dLLMs), which previously could not use standard token-level speculative decoding due to their bidirectional attention and masked language modeling formulation. The method uses a plug-and-play masking strategy that introduces reference tokens from a draft model and a custom attention mask, enabling valid logit computation for drafted tokens in a single forward pass. Evaluated on SDAR-family dLLMs across four benchmarks, SimSD achieves up to 7.46x decoding throughput improvement while maintaining or improving generation quality. The approach is compatible with other acceleration techniques such as KV cache and blockwise decoding.

Frontier Model Releases Inference Economics KV Cache speculative decoding SDAR +4 more

6arXiv · cs.CL·4d ago·source ↗

LOGOS: A unified autoregressive foundation model for natural science tasks across domains

Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.

Frontier Model Releases Open Weights Progress Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences LOGOS

6The Batch·19d ago·source ↗

Kimi K2.6: Moonshot AI's 1T-Parameter Vision-Language Model Matches Open-Weights Peers, Trails Top Closed Models

Moonshot AI released Kimi K2.6, a 1 trillion-parameter mixture-of-experts vision-language model with 32B active parameters, designed for long-horizon autonomous coding sessions lasting multiple days and multi-agent orchestration scaling to 300 parallel subagents executing up to 4,000 steps. The model matches Qwen3.6 Max Preview and DeepSeek-V4-Pro on the Artificial Analysis Intelligence Index (scoring 54 vs. their 52) while trailing closed models like GPT-5.5 and Claude Opus 4.7. Weights are freely downloadable from Hugging Face under a modified MIT license permitting commercial use, with API access priced at $0.95/$0.16/$4.00 per million input/cached/output tokens. Notable features include a 256K token context window, native INT4 quantization, a 'preserve thinking' mode for multi-turn reasoning continuity, and a research preview 'claw groups' feature enabling cross-developer agent collaboration.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Intelligence Index Claude Opus 4.6 Qwen3.6 Max Preview +14 more

6The Batch·19d ago·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index SWE-bench +9 more

6Hugging Face Blog·1mo ago·source ↗

SigLIP 2: A better multilingual vision language encoder

Google releases SigLIP 2, an improved multilingual vision-language encoder model published via Hugging Face blog. The update targets better multilingual understanding and vision-language alignment compared to the original SigLIP. The post appears to cover architectural improvements and benchmark results for this encoder model, which is commonly used as a backbone in multimodal systems.

Open Weights Progress Multimodal Progress Google SigLIP 2 Hugging Face