Entity · model

StarCoder2

modelactivestarcoder2-a8d1a90c·8 events·first seen May 19, 2026

Aliases: StarCoder2, StarCoder, StarCoder2-3B

Co-occurring entities

Hugging Face BigCode The Stack v2 SafeCoder GitHub Copilot Generalized LR Parsing Tree-sitter Weave of Formal Thought Reweighted Wake-Sleep ServiceNow AI StarChat-Alpha speculative decoding Intel Xeon INT4 Quantization Intel Optimum-Intel StarCoder2-Instruct Self-Instruct

More like this (12)

StarCoder2-Instruct SafeCoder AlphaStar StarChat-Alpha Qwen2.5-Coder SciCode CodeParrot DeepSeek-Coder-V2-0724 Devstral 2 deepseek-coder OlympicCoder BigCode

Recent events (8)

6arXiv · cs.CL·Jun 25, 2026·source ↗

Weave of Formal Thought: Sound-and-complete constrained decoding with learned latent syntax for code LLMs

The paper introduces Weave of Formal Thought (WoFT), a framework combining a formally sound-and-complete constrained decoder for code generation with a latent-variable fine-tuning method that teaches LLMs to interleave grammar non-terminals during generation. The constrained decoder extends generalized LR (GLR) parsing with speculative lexing to handle context-sensitive lexing and maximal-munch tokenization, addressing gaps in prior constrained-decoding work. A reweighted wake-sleep (RWS) fine-tuning objective on StarCoder2-3B achieves a 14.3% relative reduction in per-token cross-entropy over a text-only SFT baseline on Python, suggesting that explicit structural scaffolding recovers information lost in flat autoregressive training.

Evaluation and Benchmarking Agent and Tool Ecosystem Generalized LR Parsing Tree-sitter Weave of Formal Thought +2 more

6Hugging Face Blog·May 19, 2026·source ↗

StarCoder: A State-of-the-Art LLM for Code

Hugging Face and ServiceNow released StarCoder, a large language model for code trained on permissively licensed data from The Stack dataset. The model targets code generation, completion, and understanding tasks and is positioned as an open-weights alternative to proprietary code models. The release includes model weights, training details, and an associated technical report.

Open Weights Progress Agent and Tool Ecosystem ServiceNow AI BigCode The Stack v2 +2 more

5Hugging Face Blog·May 19, 2026·source ↗

Creating a Coding Assistant with StarCoder

This Hugging Face blog post describes the process of building StarChat-Alpha, a conversational coding assistant fine-tuned from the StarCoder large language model. The post covers the instruction-tuning methodology used to adapt StarCoder for chat-style interactions, including dataset preparation and training details. It represents an early example of open-weights coding LLMs being adapted into assistant-style deployments.

Open Weights Progress Agent and Tool Ecosystem BigCode Hugging Face StarCoder2 +2 more

5Hugging Face Blog·May 19, 2026·source ↗

Introducing SafeCoder

Hugging Face announced SafeCoder, an enterprise-focused code assistant product designed to run on-premises or in private cloud environments. The offering targets organizations that require data privacy and security guarantees, positioning it as an alternative to cloud-based coding assistants like GitHub Copilot. SafeCoder is built on top of open-weight code models and is sold as a managed solution for enterprise deployment.

Open Weights Progress Enterprise Deployment Patterns SafeCoder Hugging Face StarCoder2 +2 more

4Hugging Face Blog·May 19, 2026·source ↗

SafeCoder vs. Closed-source Code Assistants

Hugging Face published a comparison of their SafeCoder enterprise code assistant against closed-source alternatives such as GitHub Copilot. The post positions SafeCoder as a privacy-preserving, on-premises deployment option for enterprises that need code generation without sending proprietary code to external APIs. It highlights differences in data privacy, customization, and deployment control as key differentiators.

Open Weights Progress Enterprise Deployment Patterns SafeCoder Hugging Face StarCoder2 +2 more

4Hugging Face Blog·May 19, 2026·source ↗

Accelerate StarCoder with Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Hugging Face and Intel demonstrate quantization (INT8/INT4) and speculative decoding techniques applied to StarCoder on Intel Xeon CPUs using the Optimum Intel library. The post covers practical inference acceleration workflows targeting CPU deployment of code generation models. This represents a concrete inference-economics use case for open-weight code models on commodity server hardware.

Open Weights Progress Inference Economics speculative decoding Intel Xeon INT4 Quantization +4 more

7Hugging Face Blog·May 19, 2026·source ↗

StarCoder2 and The Stack v2

Hugging Face and BigCode released StarCoder2, a new family of open code language models trained on The Stack v2, a significantly expanded code dataset. The release includes multiple model sizes and represents a major update to the BigCode open-weights code model lineage. The Stack v2 is a new large-scale permissively licensed code dataset used for training.

Training Infrastructure Open Weights Progress BigCode The Stack v2 Hugging Face +2 more

5Hugging Face Blog·May 19, 2026·source ↗

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Hugging Face introduces StarCoder2-Instruct, a code generation model fine-tuned via a self-alignment approach that requires no human-annotated instruction data. The method uses the base model itself to generate synthetic instruction-response pairs, which are then filtered and used for supervised fine-tuning. The model and all training data, pipelines, and evaluation code are released under permissive licenses, making it one of the more transparent instruction-tuned code models available.

Open Weights Progress Agent and Tool Ecosystem BigCode StarCoder2-Instruct Self-Instruct +3 more