Almanac
dataset

The Stack v2

datasetactivethe-stack-v2-df797e54·2 events·first seen 28d ago

Aliases: The Stack v2, The Stack

Co-occurring entities

More like this (12)

Recent events (2)

7Hugging Face Blog·28d ago·source ↗

StarCoder2 and The Stack v2

Hugging Face and BigCode released StarCoder2, a new family of open code language models trained on The Stack v2, a significantly expanded code dataset. The release includes multiple model sizes and represents a major update to the BigCode open-weights code model lineage. The Stack v2 is a new large-scale permissively licensed code dataset used for training.

6Hugging Face Blog·28d ago·source ↗

StarCoder: A State-of-the-Art LLM for Code

Hugging Face and ServiceNow released StarCoder, a large language model for code trained on permissively licensed data from The Stack dataset. The model targets code generation, completion, and understanding tasks and is positioned as an open-weights alternative to proprietary code models. The release includes model weights, training details, and an associated technical report.