Almanac
dataset

L3Cube-MahaPOS

datasetactiveprovisionall3cube-mahapos-db6ad2c7·1 events·first seen 26h ago

Aliases: L3Cube-MahaPOS

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·26h ago·source ↗

L3Cube-MahaPOS: Gold-standard POS tagging dataset and BERT models for Marathi

Researchers introduce L3Cube-MahaPOS, a manually annotated part-of-speech tagging dataset for Marathi comprising 32,354 sentences drawn from news text, using a 16-tag Universal Dependencies-aligned scheme. The work benchmarks six model families including HMM, CRF, BiLSTM variants, MuRIL, and the Marathi-specific transformer MahaBERT-v2, with the best system achieving 88.67% token-level accuracy and 81.67% macro-F1. The dataset, annotation guidelines, and model checkpoints are released publicly to support further research in a severely under-resourced language spoken by over 83 million people.