Almanac
model

Mini-R1

modelactivemini-r1-98b17325·1 events·first seen 28d ago

Aliases: Mini-R1

Co-occurring entities

More like this (12)

Recent events (1)

5Hugging Face Blog·28d ago·source ↗

Mini-R1: Reproducing DeepSeek R1 'Aha Moment' — An RL Tutorial

A Hugging Face blog post demonstrates how to reproduce DeepSeek R1's emergent 'aha moment' reasoning behavior using reinforcement learning on a countdown game task. The tutorial walks through training a smaller model with RL to exhibit chain-of-thought self-correction, similar to the behavior observed in DeepSeek R1. This serves as a practical open-source replication effort aimed at demystifying R1's training dynamics.