4arXiv cs.LG (Machine Learning)·12d ago

FBCC: Forward-Backward Knowledge Distillation for Unsupervised Continual Clustering

A new arXiv preprint introduces Unsupervised Continual Clustering (UCC) as a problem formulation and proposes FBCC, a method using a continual teacher network with task-specific student networks to learn sequential clustering tasks without labels or stored past data. The approach uses a dual-phase forward-backward distillation process to preserve previously discovered cluster structure while learning new ones. Experiments on four benchmark datasets show FBCC outperforms continual learning baselines in clustering accuracy while reducing catastrophic forgetting.

Evaluation and Benchmarking Unsupervised Continual Clustering via Forward-Backward Knowledge Distillation

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4arXiv · cs.LG·17d ago·source ↗

FlashbackCL extends federated learning to mitigate temporal distribution shift and forgetting

FlashbackCL is a proposed extension to the Flashback federated learning method that addresses temporal forgetting — the degradation caused by client data distributions drifting over time, a scenario existing FL methods do not handle. The approach introduces temporally-decayed label counts, a device-aware replay buffer with Class-Balanced Reservoir Sampling, and server-side coreset curation. On CIFAR-10 with 50 clients, FlashbackCL achieves 6.9–10.0% relative improvement over Flashback while reducing temporal forgetting by up to 68%, with CBRS replay identified as the critical component.

Evaluation and Benchmarking CIFAR-100 CIFAR-10 FlashbackCL +3 more

6arXiv · cs.CL·22d ago·source ↗

Canonical-Context On-Policy Distillation (CCOPD) for Multi-Turn LLM Consistency

This paper identifies 'self-anchored drift' as a key failure mode in multi-turn LLMs: when information is revealed incrementally across turns, models produce unsupported assumptions that distort final answers, even when the total evidence is identical to a single-prompt setting. The authors propose Canonical-Context On-Policy Distillation (CCOPD), which trains a student model on incremental multi-turn conversations to match the output distribution of a frozen teacher conditioned on the full clean prompt. Trained only on math conversations, CCOPD achieves a 32% average relative improvement on multi-turn (RAW-SHARDED) tasks and generalizes zero-shot to five out-of-domain task families while preserving single-prompt performance.

Evaluation and Benchmarking Agent and Tool Ecosystem on-policy distillation multi-turn language models self-anchored drift +2 more

4arXiv · cs.LG·12d ago·source ↗

SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs

Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.

Evaluation and Benchmarking Open Weights Progress LLaMA-7B Qwen3-4B Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning +1 more

4arXiv · cs.AI·16d ago·source ↗

BabyCL: Continual multimodal learning from egocentric child video in a single chronological pass

Researchers introduce BabyCL, a continual learning framework that processes the SAYCam egocentric child video dataset in a single chronological pass rather than shuffled multi-epoch training, more closely mimicking how children actually encounter their environment. The system combines streaming visual representation learning with image-text contrastive objectives, a multi-stage temporal segmentation, and a dual replay buffer managing visual and multimodal histories. BabyCL outperforms streaming baselines on the SAYCam Labeled-S 4AFC benchmark under matched compute budgets, substantially closing the gap to offline training upper bounds. The work advances understanding of whether neural networks can acquire word-referent mappings under biologically plausible training conditions.

Evaluation and Benchmarking Multimodal Progress SAYCam BabyCL SAYCam Labeled-S 4AFC

5arXiv · cs.AI·17d ago·source ↗

FFR extends Forward-Forward algorithm to regression tasks with 73% memory reduction

A new arXiv preprint introduces FFR (Forward-Forward for Regression), the first framework to adapt Hinton's Forward-Forward algorithm—a biologically plausible, backpropagation-free training method—to regression problems. FFR introduces an ordinal competitive goodness function, a stratified ladder architecture, and hierarchical prediction with uncertainty estimation to handle continuous target spaces. Across five real-world regression benchmarks, FFR recovers 98.6% of backpropagation accuracy while reducing peak training memory to 27% of BP's at depth 8 and 8% at depth 32, with per-iteration time around 72% of BP's.

Training Infrastructure Evaluation and Benchmarking Forward-Forward Algorithm FFR: Forward-Forward Learning for Regression

5arXiv · cs.CL·15d ago·source ↗

ATWU: Token-level importance learning improves LLM unlearning via retain-conflict criterion

This paper introduces Alternating Token-Weighted Unlearning (ATWU), a framework that learns which tokens in a forget sample are most relevant to unlearning by characterizing their conflict with the retain objective. Rather than relying on auxiliary models or heuristics, ATWU jointly learns token forget-specificity and model parameters using a lightweight linear scorer over hidden states. Evaluated on TOFU and RWKU benchmarks, ATWU achieves state-of-the-art forget-retain trade-offs and produces token-level scores that align with ground-truth forget-specific spans.

Evaluation and Benchmarking AI Safety Research RWKU Alternating Token-Weighted Unlearning TOFU

5arXiv · cs.LG·8d ago·source ↗

Stable Recovery Manifold hypothesis: catastrophic forgetting as accessibility problem, not information destruction

A new arXiv preprint investigates the geometric structure of recoverability in continual learning using Split CIFAR-100 and a sequentially trained ResNet-18. The authors introduce Recovery Subspace Dimensionality (k_t) and find that recovery dimensionality remains stable across tasks (mean k_t = 8.0) despite substantial representational drift, with principal-angle drift strongly predicting recoverability (r = -0.862). The findings support the Stable Recovery Manifold hypothesis: forgotten knowledge remains compactly decodable, reframing catastrophic forgetting as a manifold-alignment and accessibility problem rather than true information loss.

Evaluation and Benchmarking Split CIFAR-100 Recovery Subspace Dimensionality The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning +1 more

3arXiv · cs.LG·5d ago·source ↗

HumP-KD: Uncertainty-aware multi-stage knowledge distillation for efficient fire classification

Researchers propose HumP-KD, a knowledge distillation framework that compresses two heterogeneous transformer teachers (Swin-Tiny and ViT-Base) into a lightweight MobileViT-S student for real-time fire classification. The student model achieves 0.9876 mean F1 on a 31K-image dataset while retaining only 4.94M parameters—a 5.7× reduction over Swin-Tiny—and runs at 37.72 CPU FPS. The framework combines hierarchical feature alignment, spatial attention masking, and progressive multi-stage distillation to maintain accuracy under degraded visual conditions.

Inference Economics FlameVision HumP-KD Swin-Tiny +2 more