Humanoid-GPT: GPT-style Transformer trained on 2B-frame motion corpus for zero-shot humanoid control
Researchers introduce Humanoid-GPT, a causal Transformer pre-trained on a 2-billion-frame retargeted motion corpus that unifies major mocap datasets with large-scale in-house recordings for whole-body humanoid control. The model achieves zero-shot generalization to unseen motions and control tasks, overcoming the agility-generalization trade-off seen in prior MLP-based trackers. Scaling analyses demonstrate a new performance frontier for dynamic motion tracking without task-specific fine-tuning.
Related guides (1)
Related events (8)
Image GPT: Transformer Models Applied to Pixel Sequences for Image Generation and Classification
OpenAI demonstrates that a large transformer model trained autoregressively on pixel sequences can generate coherent image completions and samples, analogous to text generation. The work establishes a correlation between generative sample quality and downstream image classification accuracy. The best generative model achieves features competitive with top convolutional networks in the unsupervised setting, suggesting shared representational principles across modalities.
PEVA: Whole-Body Conditioned Egocentric Video Prediction for Embodied World Models
Researchers from BAIR introduce PEVA (Predicting Ego-centric Video from human Actions), a model that generates first-person video frames conditioned on 48-dimensional whole-body kinematic pose trajectories. The model uses an autoregressive conditional diffusion transformer trained on the Nymeria dataset, which pairs real-world egocentric video with body pose capture. PEVA can generate atomic action videos, simulate counterfactuals, and support long video generation, representing a step toward world models grounded in physically embodied human agents.
GPT-2: 6-Month Follow-Up — 774M Parameter Model Released
OpenAI released the 774 million parameter version of GPT-2 as part of its staged release strategy, following the 124M model in February and 355M model in May 2019. The release is accompanied by an open-source legal agreement to facilitate model-sharing partnerships between organizations. OpenAI also published a technical report on coordinating with the AI research community around publication norms and staged disclosure practices.
Improving Language Understanding with Unsupervised Learning (GPT-1)
OpenAI published the GPT-1 paper in June 2018, demonstrating state-of-the-art results across diverse language tasks by combining transformer architectures with unsupervised pre-training followed by supervised fine-tuning. The approach is task-agnostic and scalable, showing that pre-training on large unlabeled text corpora and then fine-tuning on specific tasks yields strong generalization. This work established the foundational paradigm that would evolve into GPT-2, GPT-3, and subsequent large language models.
GPT-2 1.5B Full Release Completes OpenAI's Staged Release Experiment
OpenAI released the full 1.5B parameter GPT-2 model along with code and weights, completing its staged release process that began earlier in 2019. The release also includes tooling to help detect GPT-2 outputs. OpenAI frames this as a test case for responsible staged release practices for future powerful models, acknowledging that larger models had already been released by others in the interim.
Introducing GPT-5.2
OpenAI has released GPT-5.2, described as their most advanced frontier model for professional use, featuring state-of-the-art reasoning, long-context understanding, coding, and vision capabilities. The model is available through ChatGPT and the OpenAI API. It is positioned to support faster and more reliable agentic workflows.
Better language models and their implications
OpenAI announced GPT-2, a large-scale unsupervised language model capable of generating coherent multi-paragraph text and achieving state-of-the-art performance on language modeling benchmarks. The model demonstrated zero-shot capability across reading comprehension, machine translation, question answering, and summarization without task-specific fine-tuning. OpenAI notably withheld the full model release citing misuse concerns, marking an early high-profile instance of staged/responsible release policy.
Making LLMs lighter with AutoGPTQ and transformers
Hugging Face announces native integration of AutoGPTQ into the transformers library, enabling 4-bit quantized inference for large language models. The integration allows users to load and run GPTQ-quantized models directly through the standard transformers API with minimal code changes. This lowers the hardware barrier for deploying LLMs by significantly reducing VRAM requirements while maintaining competitive performance.
