o1-preview
o1-preview-e6117946·2 events·first seen 1mo agoAliases: o1-preview
Co-occurring entities
More like this (12)
Recent events (2)
DeepSeek-R1-Lite-Preview Launched with o1-Level Reasoning Performance
DeepSeek has released DeepSeek-R1-Lite-Preview, a reasoning-focused model claiming o1-preview-level performance on AIME and MATH benchmarks. The model features a transparent, real-time chain-of-thought process and demonstrates inference scaling behavior where longer reasoning chains yield better results. DeepSeek has indicated that open-source model weights and a full API are forthcoming. The model is currently accessible via chat.deepseek.com.
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
OpenAI introduces MLE-bench, a benchmark designed to measure AI agent performance on machine learning engineering tasks. The benchmark draws from Kaggle competitions to evaluate agents on realistic ML engineering workflows. Initial results show that current agents, including those powered by o1-preview, achieve competitive performance on a subset of tasks but fall well short of top human competitors. The benchmark is intended to track progress in agentic ML capabilities over time.