benchmark

Biographical

benchmarkactiveprovisionalbiographical-0f8da633·1 events·first seen 3h ago

Aliases: Biographical

Co-occurring entities

RoBERTa Claude Sonnet 4 Qwen2.5-0.5B GPT-5.5

More like this (12)

Backstory SecureBio Biomni biology AgentScope LifeSciBench BIOSSES BioLORD StylisticBias Temporal biological risk evaluation BioBERT

Recent events (1)

5arXiv · cs.CL·3h ago·source ↗

Sub-billion parameter SLMs outperform zero-shot GPT-5.4 and Claude Sonnet 4.6 on relation extraction benchmarks

A new arXiv paper demonstrates that small language models (360M–3B parameters) fine-tuned on task-specific data can substantially outperform zero-shot frontier LLMs on relation extraction tasks. The best sub-billion model, Qwen2.5-0.5B fine-tuned on pooled general-domain data, achieves micro-F1 of 0.83 versus 0.69 for GPT-5.4 and 0.66 for Claude Sonnet 4.6 in zero-shot settings. The authors attribute the gains to task adaptation rather than model architecture, with a discriminative RoBERTa baseline also exceeding frontier models, and show that 4-bit quantized models deployable on consumer GPUs can match or beat proprietary API-based systems for this narrow task. The work provides evidence that for well-defined NLP tasks with available training data, compact adapted models offer a practical, private, and hardware-efficient alternative to frontier APIs.

Evaluation and Benchmarking Open Weights Progress RoBERTa Claude Sonnet 4 Biographical +3 more