Toto 2.0: Open-Weights Time Series Foundation Models Demonstrate Scaling Laws from 4M to 2.5B Parameters
Datadog releases Toto 2.0, a family of five open-weights time series forecasting models ranging from 4M to 2.5B parameters, demonstrating consistent forecast quality improvements with scale. The models achieve state-of-the-art results on three benchmarks: BOOM (observability), GIFT-Eval (general-purpose), and TIME (contamination-resistant). The release includes architectural details, a u-muP hyperparameter transfer pipeline, and all base checkpoints under Apache 2.0 license.
Related guides (3)
Related events (8)
OpenAI Releases gpt-oss-120b and gpt-oss-20b Open-Weight Models Under Apache 2.0
OpenAI is releasing two open-weight language models, gpt-oss-120b and gpt-oss-20b, under the Apache 2.0 license. The models are claimed to outperform similarly sized open models on reasoning tasks and feature strong tool use capabilities. They are optimized for efficient deployment on consumer hardware, positioning them as cost-effective alternatives in the open-weights ecosystem.
OpenAI Releases gpt-oss-120b and gpt-oss-20b Open-Weight Reasoning Models
OpenAI has published model cards for gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models released under the Apache 2.0 license alongside a dedicated gpt-oss usage policy. This marks a significant move by OpenAI into the open-weights space, offering both a large 120B parameter model and a smaller 20B variant. The release signals a strategic shift for OpenAI, which has historically kept its frontier models proprietary.
Kimi K2.6: Moonshot AI's 1T-Parameter Vision-Language Model Matches Open-Weights Peers, Trails Top Closed Models
Moonshot AI released Kimi K2.6, a 1 trillion-parameter mixture-of-experts vision-language model with 32B active parameters, designed for long-horizon autonomous coding sessions lasting multiple days and multi-agent orchestration scaling to 300 parallel subagents executing up to 4,000 steps. The model matches Qwen3.6 Max Preview and DeepSeek-V4-Pro on the Artificial Analysis Intelligence Index (scoring 54 vs. their 52) while trailing closed models like GPT-5.5 and Claude Opus 4.7. Weights are freely downloadable from Hugging Face under a modified MIT license permitting commercial use, with API access priced at $0.95/$0.16/$4.00 per million input/cached/output tokens. Notable features include a 256K token context window, native INT4 quantization, a 'preserve thinking' mode for multi-turn reasoning continuity, and a research preview 'claw groups' feature enabling cross-developer agent collaboration.
Falcon 180B Released: New Open-Weights Frontier Model
Technology Innovation Institute (TII) has released Falcon 180B, a 180-billion parameter open-weights language model announced via Hugging Face. At the time of release, it was positioned as the largest publicly available open-weights model, trained on 3.5 trillion tokens. The model is available on Hugging Face Hub for research and commercial use under a custom license.
GPT-2: 6-Month Follow-Up — 774M Parameter Model Released
OpenAI released the 774 million parameter version of GPT-2 as part of its staged release strategy, following the 124M model in February and 355M model in May 2019. The release is accompanied by an open-source legal agreement to facilitate model-sharing partnerships between organizations. OpenAI also published a technical report on coordinating with the AI research community around publication norms and staged disclosure practices.
Qwen2.5-LLM: Alibaba releases open-weight language models from 0.5B to 72B
Alibaba's Qwen team releases the Qwen2.5 series of decoder-only dense language models, open-sourcing seven variants spanning 0.5B to 72B parameters. The release targets production use cases in the 10-30B range and mobile deployments at 3B scale. This represents a significant expansion of the open-weights frontier from a Tier 1 Chinese AI lab.
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters
This paper reframes parameter-efficient fine-tuning (PEFT) not merely as a cheaper alternative to full fine-tuning, but as a substrate for persistent, instance-specific personal models layered atop shared foundation models. The authors analyze three scaling axes: Scale Up (stronger base models amplifying adapter utility), Scale Down (minimum viable adapter size), and Scale Out (managing millions of concurrent adapted instances). They introduce MinT as an infrastructure reference for adapter identity, versioning, provenance, evaluation, and serving at scale.
Nvidia releases Nemotron 3 Super 120B-A12B open-weights model with hybrid Mamba-2/MoE architecture
Nvidia released Nemotron 3 Super 120B-A12B, an open-weights LLM with a hybrid Mamba-2/transformer/MoE architecture that activates only 12B parameters per token and supports up to 1 million token context. The model claims the fastest inference speed in its size class at 442 tokens/second and leads open-weights models on PinchBench agentic task evaluation, outperforming larger models including Kimi K2.5 (1T parameters). Nvidia is releasing weights, training data, and recipes under a permissive commercial license, and plans a $26B five-year investment in open-weights models — framed partly as a strategic response to Chinese labs building capable open-weights models on non-Nvidia hardware.


