model
mBERT
modelactiveprovisional
mbert-bbc22570·1 events·first seen 2d agoAliases: mBERT
Co-occurring entities
More like this (12)
Recent events (1)
ROMEVA: Geometry-preserving vocabulary expansion for Roman Urdu language models
Researchers propose ROMEVA, a method combining sub-word-average initialization and PCA-guided anchor loss to stabilize embeddings when expanding mBERT's vocabulary for Roman Urdu, a morphologically inconsistent low-resource language with high sub-word fragmentation. The method is evaluated on a 36,130-comment corpus with 500 new tokens added to mBERT. A notable finding is that while ROMEVA best preserves the pretrained embedding geometry, naive fine-tuning outperforms it on downstream sentiment classification, revealing a disconnect between embedding stability and task performance.