Almanac
technique

Connectionist Temporal Classification

techniqueactiveprovisionalconnectionist-temporal-classification-a611a712·1 events·first seen 24h ago

Aliases: Connectionist Temporal Classification

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·24h ago·source ↗

CANDLE: Lightweight CTC-based Arabic character deduplication for social media text normalization

CANDLE is a lightweight Arabic text normalization system that uses Connectionist Temporal Classification (CTC) to deduplicate informal character elongation without handcrafted rules or morphological analyzers. Evaluated on three benchmarks including social media text, the CTC model achieves 5.37% Sentence Error Rate and is distilled from 6 layers to 2 with minimal performance loss. A key downstream benefit is up to 12.8% reduction in tokenizer fertility across Arabic LLM tokenizers, lowering inference costs and improving context window utilization. Code and models are publicly released.