Entity · organization

BigCode

organizationactivebigcode-9f5405e1·6 events·first seen May 19, 2026

Aliases: BigCode

Co-occurring entities

Hugging Face StarCoder2 The Stack v2 ServiceNow AI StarChat-Alpha near-deduplication StarCoder2-Instruct Self-Instruct BigCodeArena

More like this (12)

BigCodeArena BigCodeBench CodeAgents CapCode Kilo Code RedCode MirrorCode CodePath CodeGemma SciCode CodeParrot unclecode

Recent events (6)

6Hugging Face Blog·May 19, 2026·source ↗

StarCoder: A State-of-the-Art LLM for Code

Hugging Face and ServiceNow released StarCoder, a large language model for code trained on permissively licensed data from The Stack dataset. The model targets code generation, completion, and understanding tasks and is positioned as an open-weights alternative to proprietary code models. The release includes model weights, training details, and an associated technical report.

Open Weights Progress Agent and Tool Ecosystem ServiceNow AI BigCode The Stack v2 +2 more

5Hugging Face Blog·May 19, 2026·source ↗

Creating a Coding Assistant with StarCoder

This Hugging Face blog post describes the process of building StarChat-Alpha, a conversational coding assistant fine-tuned from the StarCoder large language model. The post covers the instruction-tuning methodology used to adapt StarCoder for chat-style interactions, including dataset preparation and training details. It represents an early example of open-weights coding LLMs being adapted into assistant-style deployments.

Open Weights Progress Agent and Tool Ecosystem BigCode Hugging Face StarCoder2 +2 more

4Hugging Face Blog·May 19, 2026·source ↗

Large-scale Near-deduplication Behind BigCode

This Hugging Face blog post details the near-deduplication pipeline developed for the BigCode project, which processes large-scale source code datasets used to train code language models. The post covers the technical methodology for identifying and removing near-duplicate documents at scale, including hashing techniques and distributed processing approaches. Deduplication is a critical preprocessing step that affects training data quality and model generalization.

Training Infrastructure Open Weights Progress BigCode near-deduplication Hugging Face

7Hugging Face Blog·May 19, 2026·source ↗

StarCoder2 and The Stack v2

Hugging Face and BigCode released StarCoder2, a new family of open code language models trained on The Stack v2, a significantly expanded code dataset. The release includes multiple model sizes and represents a major update to the BigCode open-weights code model lineage. The Stack v2 is a new large-scale permissively licensed code dataset used for training.

Training Infrastructure Open Weights Progress BigCode The Stack v2 Hugging Face +2 more

5Hugging Face Blog·May 19, 2026·source ↗

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Hugging Face introduces StarCoder2-Instruct, a code generation model fine-tuned via a self-alignment approach that requires no human-annotated instruction data. The method uses the base model itself to generate synthetic instruction-response pairs, which are then filtered and used for supervised fine-tuning. The model and all training data, pipelines, and evaluation code are released under permissive licenses, making it one of the more transparent instruction-tuned code models available.

Open Weights Progress Agent and Tool Ecosystem BigCode StarCoder2-Instruct Self-Instruct +3 more

5Hugging Face Blog·May 19, 2026·source ↗

BigCodeArena: Judging code generations end to end with code executions

BigCodeArena is a new evaluation framework for code generation models that uses end-to-end code execution to judge outputs rather than relying on static metrics or human preference ratings alone. The approach aims to provide more reliable and objective assessments of coding model capabilities by running generated code and evaluating actual execution results. This addresses known limitations of LLM-as-judge and human annotation methods for code evaluation benchmarks.

Evaluation and Benchmarking Agent and Tool Ecosystem BigCode BigCodeArena Hugging Face