Magenta RealTime 2: open-weights live music generation model
Magenta RealTime 2 is an open-weights model for real-time live music generation, released as a GitHub repository under the Magenta project. The repository has accumulated 1,440 stars with 86 added in a single day, indicating notable community interest. This represents a multimodal generative AI release in the audio/music domain.
Related guides (3)
Related events (8)
MuseNet: OpenAI's Transformer-Based Multi-Instrument Music Generation System
OpenAI released MuseNet, a deep neural network capable of generating 4-minute musical compositions across 10 instruments and multiple styles. The system uses the same large-scale transformer architecture as GPT-2, trained on hundreds of thousands of MIDI files to predict the next token in a sequence. MuseNet discovered patterns of harmony, rhythm, and style without explicit musical programming, demonstrating the generality of the GPT-2 unsupervised approach beyond text.
Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation
Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.
Lightricks releases LTX-2 official Python inference and LoRA trainer package for audio-video generation
Lightricks has published the official Python package for LTX-2, an audio-video generative model, including both inference and LoRA fine-tuning capabilities. The repository has accumulated 7,474 stars with ongoing community traction. This represents a notable open-source multimodal generative model release combining audio and video synthesis.
OpenMOSS/MOSS-TTS: Open-Source Speech and Sound Generation Model Family
MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It targets high-fidelity, expressive synthesis across stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS. The repository has accumulated 2,192 stars with 53 added today, indicating active community interest.
Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model
Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.
Meta Pivots to Closed Weights with Muse Spark; The Batch Issue 349 Roundup
Meta introduced Muse Spark, its first AI model in roughly a year and the first product from its Superintelligence Labs, marking a pivot away from its open-weights strategy toward a closed model. Muse Spark is a natively multimodal reasoning model supporting tool use and multi-agent orchestration, with three reasoning modes and a novel 'thought compression' post-training technique using RL to penalize excessive reasoning tokens. The model ranks fourth on the Artificial Analysis Intelligence Index and matches Llama 4 Maverick's capabilities with over an order of magnitude less training compute, though it trails in coding and agentic benchmarks. The issue also covers broader industry themes including AI-native software engineering team structures, big pharma AI adoption, and regulatory developments.
OpenAI Jukebox: Neural Music Generation with Singing as Raw Audio
OpenAI introduced Jukebox, a neural network capable of generating music including rudimentary singing as raw audio across various genres and artist styles. The model operates directly on raw audio rather than symbolic representations like MIDI. OpenAI released model weights, code, and a sample exploration tool alongside the announcement.
Pixtral Large: Mistral AI's 124B Open-Weights Multimodal Model
Mistral AI released Pixtral Large, a 124B open-weights multimodal model built on Mistral Large 2, featuring a 1B parameter vision encoder and 128K context window supporting at least 30 high-resolution images. The model claims state-of-the-art results on MathVista, DocVQA, and ChartQA, outperforming GPT-4o and Gemini-1.5 Pro on several benchmarks, and leads the LMSys Vision Leaderboard among open-weights models by ~50 ELO points. Simultaneously, Mistral updated its text model to Mistral Large 24.11 with improvements in long-context understanding, function calling, and RAG/agentic workflows. Note: the model has since been deprecated and replaced by newer Mistral vision models.


