Almanac
benchmark

Text Analytics Evaluation Framework

benchmarkactivetext-analytics-evaluation-framework-65e089f1·1 events·first seen 26d ago

Aliases: Text Analytics Evaluation Framework

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·26d ago·source ↗

Text Analytics Evaluation Framework: Benchmarking LLMs on Social Media NLP Tasks

Researchers introduce a 470-question evaluation framework to assess LLM performance on aggregated social media text, applied to Twitter datasets across sentiment analysis, hate speech detection, and emotion recognition. Results show performance degrades substantially as input scale exceeds 500 instances, particularly for open-weights models on numerical tasks. Multi-label and target-dependent scenarios also show notable performance drops, and task complexity progressively erodes accuracy from basic semantic identification to comparison and counting operations. The findings point to architectural bottlenecks in current LLMs for rigorous quantitative analysis over large text collections.