4arXiv cs.LG (Machine Learning)·1h ago

FedLAB: Traceable semantic codebooks for federated multimodal graph foundation learning

FedLAB is a new federated learning framework for multimodal graph foundation models that organizes knowledge into typed hierarchical codebooks covering modality evidence, node semantics, and topology context. The system enables semantic traceability under strict data isolation, addressing a gap where existing methods exchange knowledge through parameters or embeddings without exposing how evidence jointly supports predictions. Evaluated on 10 benchmarks and 6 downstream tasks, FedLAB improves over state-of-the-art baselines by up to 7.53% while keeping raw data local.

AI Safety Research FedLAB

Related guides (1)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Real-World Flashpoints

Read asBeginner In-depth

Related events (8)

4arXiv · cs.LG·5d ago·source ↗

FedReLa: Re-labeling approach for imbalanced federated learning under data heterogeneity

Researchers propose FedReLa, a data-level method for federated learning that addresses the coexistence of global class imbalance and cross-client data heterogeneity. The approach uses a feature-dependent label re-allocator to correct biased global decision boundaries without requiring knowledge of the global class distribution. FedReLa is model-agnostic and modular, integrating with existing algorithmic methods without additional communication overhead, and claims state-of-the-art results on stepwise-imbalanced and long-tailed datasets.

FedReLa

4Hugging Face Blog·1mo ago·source ↗

Federated Learning using Hugging Face and Flower

This Hugging Face blog post describes how to combine the Hugging Face ecosystem with the Flower federated learning framework to train models across distributed, privacy-preserving data silos. It provides a practical walkthrough of integrating Transformers and Datasets libraries with Flower's federated training loop. The post targets practitioners looking to apply federated learning to NLP and other ML tasks without centralizing sensitive data.

Training Infrastructure Enterprise Deployment Patterns Federated Learning Hugging Face Datasets Hugging Face Transformers +2 more

6arXiv · cs.CL·14d ago·source ↗

LOGOS: A unified autoregressive foundation model for natural science tasks across domains

Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.

Frontier Model Releases Open Weights Progress Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences LOGOS

4arXiv · cs.LG·19d ago·source ↗

Latent World Recovery: multimodal learning framework for missing modalities in bioscience

A new arXiv preprint introduces Latent World Recovery (LWR), a framework for multimodal learning when some modalities are unavailable at training or inference time. LWR aligns modality-specific embeddings in a shared latent space and fuses only available modalities, avoiding explicit reconstruction of missing ones. The approach is evaluated on incomplete multi-omics benchmarks for cancer phenotype classification and survival prediction, demonstrating robustness under partial observation.

Multimodal Progress Latent World Recovery for Multimodal Learning with Missing Modalities Latent World Recovery

5arXiv · cs.LG·21d ago·source ↗

TREAD: VLM-based re-labelling framework improves robot policy generalization via dataset augmentation

TREAD (Task Robustness via Re-Labelling Vision-Action Robot Data) is a scalable framework that uses pretrained Vision-Language Models to augment existing robotics datasets without new data collection. The approach decomposes demonstrations into sub-tasks, segments videos accordingly, and generates linguistically diverse instruction labels, enriching language-action pair diversity. Evaluations on the LIBERO benchmark show improved generalization to novel tasks and goals, addressing a key limitation of current robot learning policies.

Agent and Tool Ecosystem Multimodal Progress TREAD LIBERO

5arXiv · cs.AI·26d ago·source ↗

UniCAD: Unified benchmark and multimodal LLM for multi-task CAD learning

Researchers introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning covering point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering. Alongside the benchmark, they present UniCAD-MLLM, a single end-to-end multimodal large language model that ingests text, images, sketches, and point clouds to perform all these tasks. The system achieves state-of-the-art results on both UniCAD and Fusion360 benchmarks, outperforming task-specific and multi-task baselines. Dataset, code, and pretrained models are to be released.

Evaluation and Benchmarking Multimodal Progress Fusion360 UniCAD-MLLM UniCAD

4arXiv · cs.CL·43h ago·source ↗

LLMs outperform traditional methods on single and multi-truth data fusion tasks

A new arXiv preprint investigates using LLMs for data fusion (truth discovery) over tabular data, covering both single-truth and multi-truth scenarios. The authors evaluate domain-dependent, domain-independent, zero-shot, and one-shot prompting strategies across three benchmark datasets. LLM-based approaches outperform traditional unsupervised methods including DART and LTM on all datasets, with code released publicly.

Evaluation and Benchmarking Enterprise Deployment Patterns DART LTM Single and Multi Truth Data Fusion using Large Language Models

6arXiv · cs.CL·1mo ago·source ↗

ATLAS: Unified Agentic and Latent Visual Reasoning via Functional Tokens

ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.

Evaluation and Benchmarking Agent and Tool Ecosystem functional token GRPO Latent-Anchored GRPO +4 more