Almanac
← Events
4GitHub Trending (AI/LLM filtered)·24d ago

GLM-OCR: Fast and Accurate OCR System from zai-org

GLM-OCR is an open-source OCR project from zai-org built on the GLM model family, positioning itself as accurate, fast, and comprehensive. The repository has accumulated 6,787 GitHub stars with 82 added today, indicating notable community traction. It represents an application of large language/vision models to document understanding and text recognition tasks.

Related guides (2)

Related events (8)

4Github Trending·22d ago·source ↗

PaddleOCR: OCR Toolkit Bridging Documents and LLMs

PaddleOCR is an open-source OCR toolkit built on PaddlePaddle that converts PDFs and images into structured data suitable for LLM pipelines. It supports 100+ languages and is positioned as a document-to-AI bridge. The repository has accumulated nearly 79,000 GitHub stars, with 148 new stars today, indicating sustained community interest.

4Hugging Face Blog·1mo ago·source ↗

Finetuning olmOCR to be a faithful OCR-Engine

TNG Technology Consulting describes a fine-tuning approach applied to olmOCR, a vision-language model designed for document OCR tasks, to improve its faithfulness and reduce hallucinations. The post covers dataset construction, training methodology, and evaluation results showing improved accuracy on document extraction benchmarks. This represents a practical community contribution to the open-weights document-understanding ecosystem.

5Hugging Face Blog·3d ago·source ↗

GLM-5.2 announced as model built for long-horizon tasks

ZAI.org published a blog post on Hugging Face announcing GLM-5.2, a model positioned for long-horizon tasks. The post appears to be a model release announcement from the GLM (General Language Model) lineage. Limited body content is available, but the framing suggests capabilities relevant to extended reasoning or agentic workflows.

5Latent Space·46h ago·source ↗

GLM-5.2 passes community vibe checks; Z.ai forecasts Open Fable by December

GLM-5.2, a new open model, is reportedly passing community vibe checks and drawing comparisons to GPT-class frontier models. Z.ai has forecast the release of Open Fable by December. The item signals a potential shift in the open-weights landscape toward genuine frontier-level capability.

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.

7Mistral Ai News·19d ago·source ↗

Mistral OCR: New Document Understanding API with State-of-the-Art Benchmark Performance

Mistral AI has released Mistral OCR, an Optical Character Recognition API designed for deep document understanding, handling text, tables, equations, images, and complex layouts from PDFs and images. The model claims top benchmark scores across math, multilingual, scanned, and table categories, outperforming Google Document AI, Azure OCR, Gemini 1.5/2.0, and GPT-4o on an internal test set. It is priced at 1000 pages per dollar (with batch inference doubling that), available via la Plateforme API today, and is already deployed as the default document understanding model in Le Chat. A selective self-hosting option is offered for organizations with sensitive data requirements.

6Deepseek·11d ago·source ↗

DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.

6arXiv · cs.CL·4d ago·source ↗

LOGOS: A unified autoregressive foundation model for natural science tasks across domains

Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.