Almanac
← Events
5arXiv cs.LG (Machine Learning)·17h ago

DreamForge-World 0.1 Preview: real-time controllable world model on consumer GPUs

Researchers present DreamForge-World 0.1 Preview, a world model for real-time interactive simulation that runs on a single RTX 4090 at 14-15 FPS at 480p resolution. The system adapts the LongLive 1 autoregressive video stack (derived from Wan2.1-T2V-1.3B) with a residual action pathway from the Matrix-Game family, supporting keyboard/mouse control, multimodal initialization, mid-stream reprompting, and dual-view operation. The work targets a low-compute niche distinct from frontier-scale world simulators, demonstrating a cost-efficient route to consumer-GPU-deployable interactive world models.

Related guides (2)

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Hugging Face published a blog post introducing Waypoint-1.5, a model or system for generating higher-fidelity interactive world simulations designed to run on consumer-grade GPUs. The post appears to describe advances in interactive world modeling or simulation quality relative to a prior Waypoint-1 release. As a tier-2 source with no body text available, specific technical details about architecture, benchmarks, or training methodology cannot be assessed.

6Hugging Face Blog·1mo ago·source ↗

Introducing Waypoint-1: Real-time interactive video diffusion from Overworld

Overworld has released Waypoint-1, a real-time interactive video diffusion model announced via the Hugging Face blog. The model appears to target interactive video generation applications, potentially including game-like or simulation environments. This represents a capability demonstration in the emerging space of real-time controllable video synthesis.

8Google Deepmind Blog·1mo ago·source ↗

Genie 3: A new frontier for world models

DeepMind has announced Genie 3, a world model capable of generating interactive, navigable 3D environments in real time at 24 fps and 720p resolution. The system maintains consistency for several minutes, representing a significant step up from prior Genie iterations. This positions Genie 3 as a frontier capability demonstration in generative world modeling for interactive applications.

5Hugging Face Blog·1mo ago·source ↗

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This Hugging Face blog post details a workflow for fine-tuning NVIDIA's Cosmos Predict 2.5 world model using LoRA and DoRA parameter-efficient techniques for robot video generation tasks. The post covers practical implementation steps for adapting the foundation video model to robotics-specific domains. This represents a concrete application of world models to embodied AI, where synthetic video generation can support robot training data pipelines.

6arXiv · cs.AI·13d ago·source ↗

Looped World Models introduce iterative latent depth as a new scaling axis for world simulation

A new arXiv preprint introduces Looped World Models (LoopWM), a parameter-shared transformer architecture that iteratively refines latent environment states to achieve up to 100x parameter efficiency over conventional world models. The approach uses adaptive computation to scale depth dynamically per prediction step, addressing the tension between long-horizon simulation fidelity and deployment cost. The authors position iterative latent depth as a new scaling axis orthogonal to model size and training data.

9Openai Blog·1mo ago·source ↗

Video generation models as world simulators

OpenAI introduces Sora, a large-scale text-conditional video diffusion model built on a transformer architecture that operates on spacetime patches of video and image latent codes. The model is trained jointly on videos and images of variable durations, resolutions, and aspect ratios. Sora can generate up to one minute of high-fidelity video and OpenAI frames scaling video generation as a path toward general-purpose physical world simulators.

6Latent Space·28d ago·source ↗

NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

A Latent Space AI news digest covers three NVIDIA announcements: Cosmos 3 (a world model/simulation platform), Nemotron 3 Ultra (a large language model), and RTX Spark (likely a new hardware or inference product). The piece frames these as a significant win for Jensen Huang and NVIDIA's AI portfolio. Coverage is commentary-tier aggregation rather than primary technical reporting.

6Hugging Face Blog·1mo ago·source ↗

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Hugging Face demonstrates a method for running RLHF fine-tuning on 20-billion-parameter language models using a single 24GB consumer GPU by combining TRL and PEFT (parameter-efficient fine-tuning). The approach uses techniques like LoRA and quantization to dramatically reduce memory requirements. This lowers the hardware barrier for RLHF experimentation from multi-GPU server setups to consumer-grade hardware.