4Hugging Face Blog·3h ago

PP-OCRv6 released on Hugging Face: 50-language OCR system from 1.5M to 34.5M parameters

PaddlePaddle has released PP-OCRv6 on Hugging Face, an OCR system supporting 50 languages with model sizes ranging from 1.5M to 34.5M parameters. The release spans a wide efficiency-accuracy tradeoff range, making it relevant for both edge and server deployment scenarios. This is a practical open-weights OCR tooling release with multilingual coverage.

Open Weights Progress Multimodal Progress PaddlePaddle PP-OCRv6 Hugging Face

Related guides (3)

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddleOCR 3.5 introduces support for running OCR and document parsing pipelines using a Hugging Face Transformers backend, enabling integration with the broader Transformers ecosystem. The update allows users to leverage transformer-based models for optical character recognition and structured document understanding tasks. This represents a convergence between the PaddlePaddle framework and the Transformers library for document AI workloads.

Enterprise Deployment Patterns Agent and Tool Ecosystem PaddlePaddle PaddleOCR Hugging Face Transformers +1 more

4Github Trending·23d ago·source ↗

PaddleOCR: OCR Toolkit Bridging Documents and LLMs

PaddleOCR is an open-source OCR toolkit built on PaddlePaddle that converts PDFs and images into structured data suitable for LLM pipelines. It supports 100+ languages and is positioned as a document-to-AI bridge. The repository has accumulated nearly 79,000 GitHub stars, with 148 new stars today, indicating sustained community interest.

Enterprise Deployment Patterns Agent and Tool Ecosystem PaddlePaddle Python PaddleOCR

6Deepseek·12d ago·source ↗

DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4

6Deepseek·12d ago·source ↗

DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4 Hugging Face

4Github Trending·25d ago·source ↗

GLM-OCR: Fast and Accurate OCR System from zai-org

GLM-OCR is an open-source OCR project from zai-org built on the GLM model family, positioning itself as accurate, fast, and comprehensive. The repository has accumulated 6,787 GitHub stars with 82 added today, indicating notable community traction. It represents an application of large language/vision models to document understanding and text recognition tasks.

Open Weights Progress Multimodal Progress zai-org GLM-OCR GLM

4Hugging Face Blog·1mo ago·source ↗

Welcome PaddlePaddle to the Hugging Face Hub

Hugging Face announced the integration of PaddlePaddle, Baidu's open-source deep learning framework, into the Hugging Face Hub. This expands the Hub's ecosystem to support PaddlePaddle models alongside existing frameworks like PyTorch and TensorFlow. The move broadens access to Chinese-developed AI models and tooling within the broader ML community.

Open Weights Progress Agent and Tool Ecosystem PaddlePaddle Baidu Hugging Face

5Hugging Face Blog·1mo ago·source ↗

Visual Document Retrieval Goes Multilingual

Hugging Face introduces VDR-2B-Multilingual, a 2-billion parameter vision-language model designed for visual document retrieval across multiple languages. The model enables retrieval of document images without OCR by embedding visual page representations directly. This extends prior visual document retrieval work to multilingual settings, broadening applicability for enterprise document search use cases.

Enterprise Deployment Patterns Multimodal Progress OCR-free document embedding visual document retrieval Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

This Hugging Face blog post provides a technical guide for fine-tuning Microsoft's Florence-2 vision-language models. Florence-2 is a compact yet capable multimodal model supporting tasks like captioning, object detection, and OCR. The post covers practical implementation details for adapting the model to custom datasets using the Hugging Face ecosystem.

Enterprise Deployment Patterns Agent and Tool Ecosystem Microsoft Hugging Face Florence-2 +1 more