Almanac
dataset

Golden Set

datasetactiveprovisionalgolden-set-067cd326·1 events·first seen 24h ago

Aliases: Golden Set

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·24h ago·source ↗

AI-PAVE-Br: LLM-based product attribute extraction system and Portuguese benchmark dataset for Brazilian e-commerce

Researchers introduce AI-PAVE-Br, an LLM-based system for Product Attribute Value Extraction (PAVE) tailored to Brazilian e-commerce catalogs in Portuguese. The paper also releases the Golden Set, a manually annotated benchmark dataset for PAVE in Portuguese, structured with entity, category, and subcategory annotations. Experiments show AI-PAVE-Br with prompt engineering substantially outperforms conventional NER baselines. The work addresses a gap in non-English NLP resources for structured e-commerce data extraction.