Almanac
← Events
3Hugging Face Blog·1mo ago

Introducing TextImage Augmentation for Document Images

Hugging Face introduces a TextImage augmentation library for document images, aimed at improving model robustness for document understanding tasks. The tooling applies transformations such as noise, blur, and distortion to document images to simulate real-world scanning and printing artifacts. This is relevant to training and fine-tuning vision-language models on document datasets.

Related guides (2)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Accelerating Document AI

This Hugging Face blog post covers the state of Document AI, focusing on tools and models for processing and understanding documents using machine learning. It likely discusses transformer-based approaches for tasks like document classification, information extraction, and visual document understanding. The post appears to survey the ecosystem of models and libraries available for document intelligence workflows.

6Hugging Face Blog·1mo ago·source ↗

Introducing SynthID Text

Hugging Face published a blog post introducing SynthID Text, Google DeepMind's watermarking technique for AI-generated text. The method embeds imperceptible signals into LLM outputs by modifying token sampling distributions, enabling detection of AI-generated content without degrading text quality. The post likely covers integration with Hugging Face's transformers library, making the technique accessible to the broader ML community.

4Hugging Face Blog·1mo ago·source ↗

Welcome aMUSEd: Efficient Text-to-Image Generation

Hugging Face introduces aMUSEd, a text-to-image model based on the MUSE architecture that prioritizes efficiency over raw quality. The model is designed to be smaller and faster than diffusion-based alternatives, making it more accessible for deployment. It is released with integration into the Diffusers library.

5Hugging Face Blog·1mo ago·source ↗

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

This Hugging Face blog post provides a technical guide for fine-tuning Microsoft's Florence-2 vision-language models. Florence-2 is a compact yet capable multimodal model supporting tasks like captioning, object detection, and OCR. The post covers practical implementation details for adapting the model to custom datasets using the Hugging Face ecosystem.

5Hugging Face Blog·1mo ago·source ↗

Open Preference Dataset for Text-to-Image Generation by the Hugging Face Community

Hugging Face has released an open preference dataset for text-to-image generation, collected through community participation. The dataset captures human preference signals across image generation outputs, intended to support alignment and reward modeling research for image generation models. This contributes to the growing ecosystem of open datasets for training and evaluating generative image models.

4Hugging Face Blog·1mo ago·source ↗

Hugging Face Machine Learning Demos on arXiv

Hugging Face announced an integration allowing ML demos to be linked or embedded directly on arXiv paper pages. This lowers the barrier between research publication and interactive model demonstration. The feature connects academic papers to live Spaces or model demos hosted on Hugging Face.

5Hugging Face Blog·1mo ago·source ↗

Hugging Face Teams Up with Protect AI: Enhancing Model Security for the ML Community

Hugging Face has announced a partnership with Protect AI to improve security for machine learning models hosted on the platform. The collaboration aims to address vulnerabilities in model files and supply chain risks that affect the broader ML community. Specific details about the technical implementation and scope of the security enhancements are not provided in the available content.

5Hugging Face Blog·1mo ago·source ↗

Docmatix: A Large-Scale Dataset for Document Visual Question Answering

Hugging Face released Docmatix, a large-scale dataset designed for Document Visual Question Answering (DocVQA) tasks. The dataset aims to address the scarcity of high-quality training data for document understanding in multimodal models. It is intended to improve fine-tuning of vision-language models on document comprehension tasks.