Entity · benchmark

FLEURS

benchmarkactivefleurs-b9045418·5 events·first seen May 18, 2026

Aliases: FLEURS

Co-occurring entities

More like this (12)

Flower FLORES BLOOM FLUX Petals FROG BLOOMZ Flamingo Flax Diffusers FLUX-2 Flow

Recent events (5)

7The Batch·Jun 4, 2026·source ↗

Microsoft Build: Seven in-house AI models, GitHub Copilot desktop agent manager, and Web IQ search API for agents

Microsoft announced seven new AI models trained from scratch (not distilled from OpenAI), including the flagship MAI-Thinking-1 reasoning model and MAI-Transcribe-1.5, plus a 'Frontier Tuning' reinforcement learning approach for enterprise workflow training. GitHub released a desktop Copilot app designed to manage multiple parallel AI agents with isolated git worktrees and bidirectional canvases. Microsoft also launched Web IQ, an agent-native Bing-powered grounding API already powering search in Copilot and ChatGPT, running 2.5x faster than alternatives with lower token costs. The roundup also covers Nous Research's Hermes Desktop cross-platform agent app, Alibaba's Qwen3.7-Plus multimodal model, and OpenAI's role-specific Codex plugins.

Frontier Model Releases Inference Economics MAI-Thinking-1 FLEURS Frontier Tuning +15 more

4arXiv · cs.CL·Jun 2, 2026·source ↗

SN-WER: Script-Normalized Word Error Rate for Multi-Script Indic ASR Evaluation

Researchers propose Script-Normalized WER (SN-WER), a training-free evaluation metric that transliterates ASR reference and hypothesis text into a canonical script before computing WER, addressing overestimation of errors caused by script mismatches in multilingual settings. Evaluated across 5 Indic languages, 2 datasets, and 3 ASR models, SN-WER reduces inflated model performance gaps by up to 12% on curated FLEURS data and attenuates romanization-induced WER inflation by 67% in controlled tests. The metric maintains near-identical sensitivity to genuine semantic errors (ΔSN-WER/ΔWER ≈ 1.09) and shows robustness to transliterator choice with token-collision rates below 0.1%. The authors recommend SN-WER as a companion metric to WER and CER, particularly for pipelines feeding downstream search, indexing, or multilingual LLM applications.

Evaluation and Benchmarking Multimodal Progress FLEURS Common Voice Character Error Rate +2 more

6The Batch·Jun 1, 2026·source ↗

Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models

This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.

Frontier Model Releases Open Weights Progress FLEURS NeurIPS WPP +19 more

8Mistral Ai News·Jun 1, 2026·source ↗

Mistral AI Releases Voxtral: Open-Weight Speech Understanding Models in 24B and 3B Sizes

Mistral AI has released Voxtral, a family of two open-weight speech understanding models (Voxtral Small at 24B and Voxtral Mini at 3B) under the Apache 2.0 license. Both models support long-form audio up to 30-40 minutes, native multilingual transcription, built-in Q&A and summarization, and function-calling directly from voice, built on the Mistral Small 3.1 language model backbone. Benchmarks show Voxtral outperforms Whisper large-v3 across all tasks and is competitive with GPT-4o mini and Gemini 2.5 Flash on audio understanding, while pricing starts at $0.001/minute via API. Models are available on Hugging Face and through Mistral's API, with a transcription-optimized variant (Voxtral Mini Transcribe) also offered.

Frontier Model Releases Open Weights Progress Mistral AI FLEURS Mistral Small 4 +14 more

7Mistral Ai News·May 18, 2026·source ↗

Mistral Releases Voxtral Transcribe 2: State-of-the-Art Speech-to-Text with Sub-200ms Realtime Model

Mistral AI has released Voxtral Transcribe 2, a family of two speech-to-text models: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Voxtral Realtime features a novel streaming architecture with configurable latency down to sub-200ms, a 4B parameter footprint suitable for edge deployment, and is released as open weights under Apache 2.0. Voxtral Mini Transcribe V2 claims state-of-the-art word error rate on FLEURS at $0.003/min, outperforming GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI, and Deepgram Nova on accuracy benchmarks. Both models support 13 languages with speaker diarization, word-level timestamps, and context biasing.

Open Weights Progress Inference Economics Mistral AI FLEURS Apache 2.0 +11 more