5arXiv cs.LG (Machine Learning)·17h ago

DreamForge-World 0.1 Preview: real-time controllable world model on consumer GPUs

Researchers present DreamForge-World 0.1 Preview, a world model for real-time interactive simulation that runs on a single RTX 4090 at 14-15 FPS at 480p resolution. The system adapts the LongLive 1 autoregressive video stack (derived from Wan2.1-T2V-1.3B) with a residual action pathway from the Matrix-Game family, supporting keyboard/mouse control, multimodal initialization, mid-stream reprompting, and dual-view operation. The work targets a low-compute niche distinct from frontier-scale world simulators, demonstrating a cost-efficient route to consumer-GPU-deployable interactive world models.

Inference Economics Multimodal Progress DreamForge-World 0.1 Preview Wan2.1-T2V-1.3B LongLive 1 Matrix-Game

Related guides (2)

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Hugging Face published a blog post introducing Waypoint-1.5, a model or system for generating higher-fidelity interactive world simulations designed to run on consumer-grade GPUs. The post appears to describe advances in interactive world modeling or simulation quality relative to a prior Waypoint-1 release. As a tier-2 source with no body text available, specific technical details about architecture, benchmarks, or training methodology cannot be assessed.

Frontier Model Releases Inference Economics Hugging Face Waypoint-1.5 +1 more

6Hugging Face Blog·1mo ago·source ↗

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

Overworld has released Waypoint-1, a real-time interactive video diffusion model announced via the Hugging Face blog. The model appears to target interactive video generation applications, potentially including game-like or simulation environments. This represents a capability demonstration in the emerging space of real-time controllable video synthesis.

Inference Economics Agent and Tool Ecosystem Overworld Hugging Face Waypoint-1.5 +1 more

8Google Deepmind Blog·1mo ago·source ↗

Genie 3: A new frontier for world models

DeepMind has announced Genie 3, a world model capable of generating interactive, navigable 3D environments in real time at 24 fps and 720p resolution. The system maintains consistency for several minutes, representing a significant step up from prior Genie iterations. This positions Genie 3 as a frontier capability demonstration in generative world modeling for interactive applications.

Frontier Model Releases Agent and Tool Ecosystem Genie 3 Google DeepMind +1 more

5Hugging Face Blog·1mo ago·source ↗

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.

Inference Economics Agent and Tool Ecosystem DoRA LoRA NVIDIA +3 more

6arXiv · cs.AI·13d ago·source ↗

Looped World Models introduce iterative latent depth as a new scaling axis for world simulation

A new arXiv preprint introduces Looped World Models (LoopWM), a parameter-shared transformer architecture that iteratively refines latent environment states to achieve up to 100x parameter efficiency over conventional world models. The approach uses adaptive computation to scale depth dynamically per prediction step, addressing the tension between long-horizon simulation fidelity and deployment cost. The authors position iterative latent depth as a new scaling axis orthogonal to model size and training data.

Training Infrastructure Frontier Model Releases Looped World Models LoopWM +2 more

9Openai Blog·1mo ago·source ↗

Video generation models as world simulators

OpenAI introduces Sora, a large-scale text-conditional video diffusion model built on a transformer architecture that operates on spacetime patches of video and image latent codes. The model is trained jointly on videos and images of variable durations, resolutions, and aspect ratios. Sora can generate up to one minute of high-fidelity video and OpenAI frames scaling video generation as a path toward general-purpose physical world simulators.

Training Infrastructure Frontier Model Releases Linear Diffusion Transformer spacetime patch OpenAI +2 more

6Latent Space·28d ago·source ↗

NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

A Latent Space AI news digest covers three NVIDIA announcements: Cosmos 3 (a world model/simulation platform), Nemotron 3 Ultra (a large language model), and RTX Spark (likely a new hardware or inference product). The piece frames these as a significant win for Jensen Huang and NVIDIA's AI portfolio. Coverage is commentary-tier aggregation rather than primary technical reporting.

Training Infrastructure Frontier Model Releases NVIDIA Cosmos NVIDIA RTX Spark NVIDIA +4 more

6Hugging Face Blog·1mo ago·source ↗

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Hugging Face demonstrates a method for running RLHF fine-tuning on 20-billion-parameter language models using a single 24GB consumer GPU by combining TRL and PEFT (parameter-efficient fine-tuning). The approach uses techniques like LoRA and quantization to dramatically reduce memory requirements. This lowers the hardware barrier for RLHF experimentation from multi-GPU server setups to consumer-grade hardware.

Open Weights Progress Inference Economics PEFT Reinforcement Learning from Human Feedback LoRA +4 more