organization
abjadai
organizationactiveprovisional
abjadai-cb118f96·1 events·first seen 23h agoAliases: abjadai
Co-occurring entities
More like this (12)
Recent events (1)
CANDLE: Lightweight CTC-based Arabic character deduplication for social media text normalization
CANDLE is a lightweight Arabic text normalization system that uses Connectionist Temporal Classification (CTC) to deduplicate informal character elongation without handcrafted rules or morphological analyzers. Evaluated on three benchmarks including social media text, the CTC model achieves 5.37% Sentence Error Rate and is distilled from 6 layers to 2 with minimal performance loss. A key downstream benefit is up to 12.8% reduction in tokenizer fertility across Arabic LLM tokenizers, lowering inference costs and improving context window utilization. Code and models are publicly released.