FedLAB: Traceable semantic codebooks for federated multimodal graph foundation learning
FedLAB is a new federated learning framework for multimodal graph foundation models that organizes knowledge into typed hierarchical codebooks covering modality evidence, node semantics, and topology context. The system enables semantic traceability under strict data isolation, addressing a gap where existing methods exchange knowledge through parameters or embeddings without exposing how evidence jointly supports predictions. Evaluated on 10 benchmarks and 6 downstream tasks, FedLAB improves over state-of-the-art baselines by up to 7.53% while keeping raw data local.
Related guides (1)
Related events (8)
FedReLa: Re-labeling approach for imbalanced federated learning under data heterogeneity
Researchers propose FedReLa, a data-level method for federated learning that addresses the coexistence of global class imbalance and cross-client data heterogeneity. The approach uses a feature-dependent label re-allocator to correct biased global decision boundaries without requiring knowledge of the global class distribution. FedReLa is model-agnostic and modular, integrating with existing algorithmic methods without additional communication overhead, and claims state-of-the-art results on stepwise-imbalanced and long-tailed datasets.
Federated Learning using Hugging Face and Flower
This Hugging Face blog post describes how to combine the Hugging Face ecosystem with the Flower federated learning framework to train models across distributed, privacy-preserving data silos. It provides a practical walkthrough of integrating Transformers and Datasets libraries with Flower's federated training loop. The post targets practitioners looking to apply federated learning to NLP and other ML tasks without centralizing sensitive data.
LOGOS: A unified autoregressive foundation model for natural science tasks across domains
Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.
Latent World Recovery: multimodal learning framework for missing modalities in bioscience
A new arXiv preprint introduces Latent World Recovery (LWR), a framework for multimodal learning when some modalities are unavailable at training or inference time. LWR aligns modality-specific embeddings in a shared latent space and fuses only available modalities, avoiding explicit reconstruction of missing ones. The approach is evaluated on incomplete multi-omics benchmarks for cancer phenotype classification and survival prediction, demonstrating robustness under partial observation.
TREAD: VLM-based re-labelling framework improves robot policy generalization via dataset augmentation
TREAD (Task Robustness via Re-Labelling Vision-Action Robot Data) is a scalable framework that uses pretrained Vision-Language Models to augment existing robotics datasets without new data collection. The approach decomposes demonstrations into sub-tasks, segments videos accordingly, and generates linguistically diverse instruction labels, enriching language-action pair diversity. Evaluations on the LIBERO benchmark show improved generalization to novel tasks and goals, addressing a key limitation of current robot learning policies.
UniCAD: Unified benchmark and multimodal LLM for multi-task CAD learning
Researchers introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning covering point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering. Alongside the benchmark, they present UniCAD-MLLM, a single end-to-end multimodal large language model that ingests text, images, sketches, and point clouds to perform all these tasks. The system achieves state-of-the-art results on both UniCAD and Fusion360 benchmarks, outperforming task-specific and multi-task baselines. Dataset, code, and pretrained models are to be released.
LLMs outperform traditional methods on single and multi-truth data fusion tasks
A new arXiv preprint investigates using LLMs for data fusion (truth discovery) over tabular data, covering both single-truth and multi-truth scenarios. The authors evaluate domain-dependent, domain-independent, zero-shot, and one-shot prompting strategies across three benchmark datasets. LLM-based approaches outperform traditional unsupervised methods including DART and LTM on all datasets, with code released publicly.
ATLAS: Unified Agentic and Latent Visual Reasoning via Functional Tokens
ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.
