5GitHub Trending (AI/LLM filtered)·12d ago

Magenta RealTime 2: open-weights live music generation model

Magenta RealTime 2 is an open-weights model for real-time live music generation, released as a GitHub repository under the Magenta project. The repository has accumulated 1,440 stars with 86 added in a single day, indicating notable community interest. This represents a multimodal generative AI release in the audio/music domain.

Open Weights Progress Multimodal Progress Magenta RealTime 2 Magenta Google

Related guides (3)

Google

Google: The AI Lab That Builds Everything from DNA Models to Your Phone's Assistant

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Related events (8)

5Openai Blog·1mo ago·source ↗

MuseNet: OpenAI's Transformer-Based Multi-Instrument Music Generation System

OpenAI released MuseNet, a deep neural network capable of generating 4-minute musical compositions across 10 instruments and multiple styles. The system uses the same large-scale transformer architecture as GPT-2, trained on hundreds of thousands of MIDI files to predict the next token in a sequence. MuseNet discovered patterns of harmony, rhythm, and style without explicit musical programming, demonstrating the generality of the GPT-2 unsupervised approach beyond text.

Frontier Model Releases Multimodal Progress GPT-2 MIDI MuseNet +1 more

7The Batch·19d ago·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

5Github Trending·2d ago·source ↗

Lightricks releases LTX-2 official Python inference and LoRA trainer package for audio-video generation

Lightricks has published the official Python package for LTX-2, an audio-video generative model, including both inference and LoRA fine-tuning capabilities. The repository has accumulated 7,474 stars with ongoing community traction. This represents a notable open-source multimodal generative model release combining audio and video synthesis.

Open Weights Progress Multimodal Progress LoRA Lightricks LTX-2

4Github Trending·23d ago·source ↗

OpenMOSS/MOSS-TTS: Open-Source Speech and Sound Generation Model Family

MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It targets high-fidelity, expressive synthesis across stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS. The repository has accumulated 2,192 stars with 53 added today, indicating active community interest.

Open Weights Progress Multimodal Progress OpenMOSS MOSS-TTS MOSI.AI

7The Batch·28d ago·source ↗

Thinking Machines Lab Reveals TML-Interaction-Small: Real-Time Multimodal Interaction Model

Thinking Machines Lab (founded by Mira Murati) has announced TML-Interaction-Small, a 276B-parameter mixture-of-experts multimodal model that processes audio, video, and text concurrently using 200ms 'micro-turns' rather than waiting for conversational turns to complete. The architecture uses encoder-free early fusion, pairing a fast foreground interaction model with an asynchronous background reasoning model that shares context. On interactivity benchmarks (FD-bench V1/V1.5), it outperforms GPT-Realtime-2 and Gemini-3.1-flash-live-preview, though it trails GPT-Realtime-2 on intelligence benchmarks. A closed research preview is expected in coming months with wider release later in 2026.

Frontier Model Releases Inference Economics encoder-free early fusion Thinking Machines GPT-Realtime-2 +16 more

7The Batch·19d ago·source ↗

Meta Pivots to Closed Weights with Muse Spark; The Batch Issue 349 Roundup

Meta introduced Muse Spark, its first AI model in roughly a year and the first product from its Superintelligence Labs, marking a pivot away from its open-weights strategy toward a closed model. Muse Spark is a natively multimodal reasoning model supporting tool use and multi-agent orchestration, with three reasoning modes and a novel 'thought compression' post-training technique using RL to penalize excessive reasoning tokens. The model ranks fourth on the Artificial Analysis Intelligence Index and matches Llama 4 Maverick's capabilities with over an order of magnitude less training compute, though it trails in coding and agentic benchmarks. The issue also covers broader industry themes including AI-native software engineering team structures, big pharma AI adoption, and regulatory developments.

Frontier Model Releases Open Weights Progress DeepLearning.AI Artificial Analysis Intelligence Index Meta Superintelligence Labs +9 more

6Openai Blog·1mo ago·source ↗

OpenAI Jukebox: Neural Music Generation with Singing as Raw Audio

OpenAI introduced Jukebox, a neural network capable of generating music including rudimentary singing as raw audio across various genres and artist styles. The model operates directly on raw audio rather than symbolic representations like MIDI. OpenAI released model weights, code, and a sample exploration tool alongside the announcement.

Open Weights Progress Multimodal Progress Jukebox OpenAI

7Mistral Ai News·1mo ago·source ↗

Pixtral Large: Mistral AI's 124B Open-Weights Multimodal Model

Mistral AI released Pixtral Large, a 124B open-weights multimodal model built on Mistral Large 2, featuring a 1B parameter vision encoder and 128K context window supporting at least 30 high-resolution images. The model claims state-of-the-art results on MathVista, DocVQA, and ChartQA, outperforming GPT-4o and Gemini-1.5 Pro on several benchmarks, and leads the LMSys Vision Leaderboard among open-weights models by ~50 ELO points. Simultaneously, Mistral updated its text model to Mistral Large 24.11 with improvements in long-context understanding, function calling, and RAG/agentic workflows. Note: the model has since been deprecated and replaced by newer Mistral vision models.

Frontier Model Releases Evaluation and Benchmarking Google Cloud Mistral AI MT-Bench +15 more