4arXiv cs.CL (Computation and Language)·5d ago

HIPE-2026 evaluation campaign: person-place relation extraction from multilingual historical texts

HIPE-2026 is the third edition of a shared-task evaluation series, shifting focus from named entity recognition to temporally grounded relation extraction across French, German, and English historical documents. Seventeen teams submitted over 40 runs, spanning large language models to lightweight classifiers, evaluated on predictive accuracy, computational efficiency, and cross-domain generalization. The campaign surfaces trade-offs between accuracy and robustness when processing noisy OCR text from 19th–20th century newspapers and early modern literary sources. Results provide a benchmark snapshot of the current state of historical relation extraction for cultural heritage applications.

Evaluation and Benchmarking HIPE-2022 HIPE-2020 HIPE-2026

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

3arXiv · cs.CL·37h ago·source ↗

Temporal fusion strategies for NER in historical texts favor late fusion mechanisms

A new arXiv preprint systematically evaluates how temporal metadata can be embedded into transformer-based NER models for historical texts, comparing absolute vs. relative temporal representations and early vs. late fusion mechanisms including cross-attention, adapters, and concatenation. Experiments on French and German historical datasets show that late fusion strategies yield more robust and temporally generalizable performance, especially on early and noisy text periods. The work addresses a narrow but underexplored challenge of diachronic NLP where entity surface forms drift across time.

Evaluation and Benchmarking A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

4arXiv · cs.CL·1mo ago·source ↗

Fifth Shared Task on Multilingual Coreference Resolution: Long-Range Entities and LLM Participation

The fifth CODI-CRAC shared task on multilingual coreference resolution expanded its scope with five new datasets and two additional languages, leveraging CorefUD 1.4 covering 27 datasets across 19 languages. The 2026 edition emphasized long-range coreference chains spanning many words and sentences. Ten systems participated, including four LLM-based approaches; traditional systems still led but LLMs showed notable potential, suggesting competitive parity may be near.

Long Context Evolution Evaluation and Benchmarking CODI-CRAC 2026 CorefUD Multilingual Coreference Resolution Shared Task

3arXiv · cs.CL·11d ago·source ↗

IHUBERT: Persian RoBERTa-base model trained on 45GB semantically deduplicated corpus

Researchers introduce IHUBERT, a 125M-parameter monolingual Persian pretrained language model trained from scratch using the RoBERTa-base architecture on a 45GB curated subset of the Sepahr-Danesh collection (~7-8B tokens). The work features a multi-stage preprocessing pipeline including vector-database-based semantic deduplication for domain-balanced pretraining, and a 139k-vocabulary BPE tokenizer optimized for Persian morphology. IHUBERT is evaluated across seven Persian NLU benchmarks, achieving state-of-the-art results on extractive QA (PQuAD F1 88.35) and NLI (FarsTail Macro-F1 0.835). The paper contributes both a new model and a semantic deduplication methodology applicable to low-resource language pretraining.

Evaluation and Benchmarking Sepahr-Danesh RoBERTa PQuAD +3 more

4Hugging Face Blog·8d ago·source ↗

PP-OCRv6 released on Hugging Face: 50-language OCR system from 1.5M to 34.5M parameters

PaddlePaddle has released PP-OCRv6 on Hugging Face, an OCR system supporting 50 languages with model sizes ranging from 1.5M to 34.5M parameters. The release spans a wide efficiency-accuracy tradeoff range, making it relevant for both edge and server deployment scenarios. This is a practical open-weights OCR tooling release with multilingual coverage.

Open Weights Progress Multimodal Progress PaddlePaddle PP-OCRv6 Hugging Face

5Hugging Face Blog·1mo ago·source ↗

Introducing HELMET: Holistically Evaluating Long-context Language Models

HELMET is a new benchmark designed to holistically evaluate long-context language models across diverse real-world tasks rather than synthetic needle-in-a-haystack tests. The benchmark covers multiple task categories including retrieval, reasoning, summarization, and code, aiming to provide more reliable and comprehensive assessment of long-context capabilities. It is introduced via the Hugging Face blog, suggesting an open release with associated tooling for the community.

Long Context Evolution Evaluation and Benchmarking HELMET Hugging Face

3arXiv · cs.CL·4d ago·source ↗

ReaORE: Reasoning-guided progressive framework for open relation extraction using large reasoning models

ReaORE is a new framework for Open Relation Extraction (OpenRE) that uses a coarse-to-fine reasoning pipeline to identify unseen relation types between entities in unstructured text. The approach combines a relation filtering stage (using multi-aspect reasoning and embedding-based similarity) with a fine-grained comparative reasoning stage for relation prediction. The authors report that ReaORE outperforms existing baselines on two standard OpenRE benchmarks, addressing limitations of both clustering-based and direct LLM generation approaches.

Evaluation and Benchmarking ReaORE

6Deepseek·20d ago·source ↗

DeepSeek releases DeepSeek-OCR-2 vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR-2, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture and tagged for OCR and vision-language tasks. The model has accumulated over 1.8 million downloads and 980 likes, indicating substantial community uptake. It extends DeepSeek's multimodal model lineup with a specialized document/OCR capability.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4 Hugging Face

6Deepseek·20d ago·source ↗

DeepSeek releases DeepSeek-OCR vision-language model on Hugging Face

DeepSeek has released DeepSeek-OCR, a multilingual image-text-to-text model on Hugging Face, built on the DeepSeek-VL-v2 architecture. The model targets OCR and image feature extraction tasks and has accumulated over 2.4 million downloads and 3,275 likes, indicating significant community uptake. This represents an open-weights multimodal release from a major Chinese AI lab.

Open Weights Progress Multimodal Progress DeepSeek-OCR-2 DeepSeek V4