model

Whisper

modelactivewhisper-a6748690·14 events·first seen 29d ago

Aliases: Whisper

Co-occurring entities

More like this (12)

faster-whisper whisper.cpp GPT-Realtime-Whisper CATT-Whisper Whisper large-v3 WhatsApp Whisk ShadowHand Snowflake Voice Engine Qwen Chat Speech-to-Speech

Recent events (14)

8Openai Blog·28d ago·source ↗

Introducing Whisper

OpenAI introduced Whisper, an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The model demonstrates strong robustness to accents, background noise, and technical language, approaching human-level accuracy in English transcription. Whisper supports transcription in multiple languages as well as translation to English, and the weights and inference code were released publicly.

Open Weights Progress Agent and Tool Ecosystem OpenAI Whisper +1 more

6arXiv · cs.AI·9d ago·source ↗

Sparse AutoEncoder steering reduces Whisper hallucination rate by ~5x without fine-tuning

Researchers investigate hallucination detection and mitigation in OpenAI's Whisper ASR model by probing internal encoder representations. They find that both raw activations and Sparse AutoEncoder (SAE) latents encode linearly separable hallucination signals concentrated in deeper layers. SAE-based activation steering reduces hallucination rates from 72.6% to 14.1% (Whisper small) and 86.9% to 27.3% (Whisper large-v3) on non-speech audio, with minimal WER degradation, approaching fine-tuning-level performance without weight updates.

Evaluation and Benchmarking AI Safety Research Sparse Autoencoder OpenAI Whisper

4Hugging Face Blog·29d ago·source ↗

Blazingly Fast Whisper Transcriptions with Inference Endpoints

Hugging Face published a blog post detailing optimized Whisper speech-to-text transcription deployments via their Inference Endpoints service. The post covers performance improvements using faster-whisper or similar optimized backends to achieve significantly reduced transcription latency. This is positioned as a practical deployment guide for production speech recognition workloads.

Inference Economics Enterprise Deployment Patterns Hugging Face Inference Endpoints Hugging Face faster-whisper +1 more

5Hugging Face Blog·29d ago·source ↗

Speculative Decoding for 2x Faster Whisper Inference

Hugging Face demonstrates applying speculative decoding to OpenAI's Whisper speech recognition model, achieving approximately 2x inference speedup. The technique uses a smaller draft model to propose token sequences that the larger target model then verifies, reducing the number of full forward passes required. This post covers implementation details using the Hugging Face Transformers library and benchmarks the approach across different hardware configurations.

Inference Economics Agent and Tool Ecosystem speculative decoding Hugging Face Transformers Hugging Face +2 more

4Hugging Face Blog·29d ago·source ↗

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

This Hugging Face blog post provides a practical guide for fine-tuning OpenAI's Whisper model for multilingual automatic speech recognition using the Transformers library. It covers dataset preparation, training configuration, and evaluation using the Word Error Rate metric. The post targets practitioners seeking to adapt Whisper to low-resource or domain-specific languages.

Open Weights Progress Agent and Tool Ecosystem Hugging Face Transformers Hugging Face Word Error Rate +2 more

2Github Trending·27d ago·source ↗

OpenAI Whisper GitHub Repository Trending

The OpenAI Whisper repository, implementing robust speech recognition via large-scale weak supervision, is trending on GitHub with approximately 100k total stars and 84 new stars today. Whisper is an open-weights automatic speech recognition model trained on large-scale weakly supervised audio data. The continued community interest reflects ongoing adoption of Whisper as a foundational ASR component in downstream applications and pipelines.

Open Weights Progress OpenAI Whisper

7Latent Space·29d ago·source ↗

GPT-Realtime-2, GPT-Translate, and new Whisper: OpenAI's new SOTA realtime voice APIs

OpenAI has released a suite of new real-time voice and audio APIs including GPT-Realtime-2, a GPT-Translate model, and an updated Whisper, all positioned as state-of-the-art for real-time voice applications. The releases appear to be part of a broader push to deploy GPT-5 capabilities across multiple product surfaces. Coverage comes from the Latent Space AI News digest, which aggregates and contextualizes the announcements.

Frontier Model Releases Agent and Tool Ecosystem GPT-Realtime-2 OpenAI Whisper +3 more

7Openai Blog·28d ago·source ↗

Introducing ChatGPT and Whisper APIs

OpenAI announced the release of dedicated APIs for ChatGPT (gpt-3.5-turbo) and Whisper, enabling developers to integrate conversational AI and speech-to-text capabilities into their applications. The ChatGPT API offered significant cost reductions compared to existing GPT-3.5 endpoints. This marked a major step in OpenAI's platform strategy, opening programmatic access to its most widely used consumer models.

Inference Economics Enterprise Deployment Patterns GPT-3.5 Turbo ChatGPT OpenAI +2 more

4Hugging Face Blog·29d ago·source ↗

Powerful ASR + Diarization + Speculative Decoding with Hugging Face Inference Endpoints

Hugging Face published a blog post describing a pipeline that combines automatic speech recognition (ASR), speaker diarization, and speculative decoding on their Inference Endpoints platform. The post demonstrates how these three techniques can be integrated to produce faster, speaker-attributed transcriptions. Speculative decoding is highlighted as a key inference optimization that reduces latency for ASR workloads.

Inference Economics Agent and Tool Ecosystem Hugging Face Inference Endpoints speculative decoding Hugging Face +2 more

3arXiv · cs.CL·22d ago·source ↗

Thaka Wins KSAA-2026 Arabic Speech Diacritization Task with Regularized Fine-Tuning of CATT-Whisper

The Thaka team describes their winning system for Task 2 of the KSAA-2026 Shared Task on Arabic Speech Dictation with Automatic Diacritization, which requires producing fully diacritized Arabic text from speech audio and undiacritized transcripts. Their approach fine-tunes CATT-Whisper, a multimodal model combining a CATT text encoder with a frozen Whisper speech encoder, under severe data constraints (2,327 training samples, no external data). Key techniques include R-Drop consistency regularization, Optuna-optimized hyperparameters with high weight decay, Focal Loss, and Monte Carlo Dropout inference averaging over 200 stochastic forward passes across four checkpoints. The system achieves 23.26% WER on the primary metric, placing first among all participants.

Multimodal Progress Optuna Focal Loss CATT +6 more

4arXiv · cs.AI·2d ago·source ↗

LEAF-X: Entropy-guided explainability framework for transformer-based ASR models

Researchers introduce LEAF-X (Listening with Entropy-guided Attention for Faithful explainability), a model-intrinsic XAI framework for transformer-based automatic speech recognition systems like Whisper. The method combines entropy-guided attention weighting, multi-layer attention rollout, and optional causal ablations to produce sparse token-to-frame attributions. Evaluations show 32% improved faithfulness and 35-39% stronger locality/sparsity compared to perturbation-based explainers and raw attention maps, enabling more auditable ASR.

AI Safety Research Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models LEAF-X Whisper

3Hugging Face Blog·29d ago·source ↗

AI Speech Recognition in Unity

A Hugging Face blog post describes integrating AI-based automatic speech recognition (ASR) into Unity game/application environments. The post likely covers using transformer-based ASR models within the Unity engine, bridging ML inference with real-time interactive applications. This represents a practical deployment pattern for on-device or embedded ASR in non-traditional runtime environments.

Enterprise Deployment Patterns Agent and Tool Ecosystem Unity Hugging Face Whisper

7Openai Blog·28d ago·source ↗

GPT-4 API General Availability and Completions API Deprecation Plan

OpenAI has announced general availability of the GPT-4 API, alongside GPT-3.5 Turbo, DALL·E, and Whisper APIs. Concurrently, OpenAI is releasing a deprecation plan for older models in the Completions API, which are set to retire at the beginning of 2024. This marks a significant milestone in OpenAI's API product lifecycle, transitioning GPT-4 from limited access to broad developer availability.

Frontier Model Releases Inference Economics GPT-3.5 Turbo DALL·E 3 OpenAI +4 more

6The Batch·28d ago·source ↗

Data Points: Cursor Composer 2.5, Gemini 3.5 Flash, Antigravity 2.0, Omni Flash, AI Search, and Corti Symphony

This edition covers several notable AI product and model releases: Cursor shipped Composer 2.5 (built on Kimi K2.5) scoring 79.8% on SWE-Bench Multilingual at significantly lower cost than frontier competitors; Google released Gemini 3.5 Flash with claimed 4x speed advantage and launched Antigravity 2.0 as an agent-first desktop app replacing its IDE; Google also introduced Gemini Omni Flash for multimodal video generation and overhauled its search interface with Gemini 3.5. Additionally, Copenhagen-based Corti launched Symphony for Speech-to-Text achieving 1.4% word error rate on medical terminology versus 17-19% for generalist models.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.5 Pro Gemini Spark Cursor Composer 2.5 +23 more