PP-OCRv6 released on Hugging Face: 50-language OCR system from 1.5M to 34.5M parameters
PaddlePaddle has released PP-OCRv6 on Hugging Face, an OCR system supporting 50 languages with model sizes ranging from 1.5M to 34.5M parameters. The release spans a wide efficiency-accuracy tradeoff range, making it relevant for both edge and server deployment scenarios. This is a practical open-weights OCR tooling release with multilingual coverage.
Related guides (3)
Related events (8)
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
PaddleOCR 3.5 introduces support for running OCR and document parsing pipelines using a Hugging Face Transformers backend, enabling integration with the broader Transformers ecosystem. The update allows users to leverage transformer-based models for optical character recognition and structured document understanding tasks. This represents a convergence between the PaddlePaddle framework and the Transformers library for document AI workloads.
PaddleOCR: OCR Toolkit Bridging Documents and LLMs
PaddleOCR is an open-source OCR toolkit built on PaddlePaddle that converts PDFs and images into structured data suitable for LLM pipelines. It supports 100+ languages and is positioned as a document-to-AI bridge. The repository has accumulated nearly 79,000 GitHub stars, with 148 new stars today, indicating sustained community interest.
DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face
DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.
DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face
DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.
GLM-OCR: Fast and Accurate OCR System from zai-org
GLM-OCR is an open-source OCR project from zai-org built on the GLM model family, positioning itself as accurate, fast, and comprehensive. The repository has accumulated 6,787 GitHub stars with 82 added today, indicating notable community traction. It represents an application of large language/vision models to document understanding and text recognition tasks.
Welcome PaddlePaddle to the Hugging Face Hub
Hugging Face announced the integration of PaddlePaddle, Baidu's open-source deep learning framework, into the Hugging Face Hub. This expands the Hub's ecosystem to support PaddlePaddle models alongside existing frameworks like PyTorch and TensorFlow. The move broadens access to Chinese-developed AI models and tooling within the broader ML community.
Visual Document Retrieval Goes Multilingual
Hugging Face introduces VDR-2B-Multilingual, a 2-billion parameter vision-language model designed for visual document retrieval across multiple languages. The model enables retrieval of document images without OCR by embedding visual page representations directly. This extends prior visual document retrieval work to multilingual settings, broadening applicability for enterprise document search use cases.
Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models
This Hugging Face blog post provides a technical guide for fine-tuning Microsoft's Florence-2 vision-language models. Florence-2 is a compact yet capable multimodal model supporting tasks like captioning, object detection, and OCR. The post covers practical implementation details for adapting the model to custom datasets using the Hugging Face ecosystem.


