5arXiv cs.LG (Machine Learning)·20d ago

TxFM: Masked Autoencoding Foundation Model for RNA-seq Gene Expression Representation

The paper introduces TxFM, a self-supervised masked autoencoder model for transcriptomic (RNA-seq) data representation learning, trained on a curated 1.4M-sample corpus called DiverseRNA-1.4M. TxFM outperforms existing transcriptomic foundation models trained on datasets over 100x larger, addressing the known problem of deep models underperforming linear baselines on gene expression data. The work provides ablation studies identifying critical architecture choices and argues that careful data curation combined with inductive self-supervised learning is sufficient for strong transfer performance in transcriptomics.

Evaluation and Benchmarking DiverseRNA-1.4M masked autoencoding RNA sequencing TxFM transcriptomic foundation models

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Training mRNA Language Models Across 25 Species for $165

A Hugging Face blog post describes training mRNA language models spanning 25 biological species at a total compute cost of $165. The work demonstrates that biological sequence language models can be trained at extremely low cost, potentially democratizing genomic/transcriptomic AI research. The post likely covers model architecture, training data, and cross-species generalization results.

Training Infrastructure Open Weights Progress mRNA Language Model OpenMed Hugging Face

5arXiv · cs.AI·1mo ago·source ↗

Ensembling Tabular Foundation Models: A Diversity Ceiling and a Calibration Trap

This paper benchmarks six ensemble strategies across six tabular foundation models (TFMs) on 153 OpenML classification tasks, finding that ensembling provides minimal gains over the best single TFM. The best ensemble strategy (two-level cascade stacking) achieves only +0.18% accuracy improvement at 253× the compute cost. A key finding is that logistic-regression meta-learner stacking improves accuracy while severely degrading calibration (log-loss), because sharpening class boundaries destroys probability estimates. The authors recommend greedy ensemble selection as the practical default.

Evaluation and Benchmarking Enterprise Deployment Patterns Q-statistic Greedy Ensemble Selection Friedman-Nemenyi Test +3 more

5Github Trending·29d ago·source ↗

TimesFM: Google Research Pretrained Time-Series Foundation Model

TimesFM is a pretrained foundation model developed by Google Research specifically for time-series forecasting tasks. The repository has accumulated over 20,000 GitHub stars with 99 new stars today, indicating sustained community interest. It represents Google Research's effort to apply the foundation model paradigm to time-series data rather than language or vision.

Frontier Model Releases Enterprise Deployment Patterns Google Research TimesFM

5arXiv · cs.AI·1mo ago·source ↗

Distilling Tabular Foundation Models for Structured Health Data

This paper investigates knowledge distillation from tabular foundation models (TFMs) to lightweight student models for healthcare applications. The authors address context leakage in in-context TFMs via stratified out-of-fold teacher labeling, evaluating across 19 healthcare datasets, 6 TFM teachers, and 4 student families. Distilled students retain at least 90% of teacher AUC while running 26× faster on CPU, with preserved calibration and fairness properties. Multi-teacher ensembles do not consistently outperform the best single teacher.

Evaluation and Benchmarking Inference Economics knowledge distillation Stratified Out-of-Fold Teacher Labeling AUC +2 more

5arXiv · cs.AI·17d ago·source ↗

FINO: Label-free adaptation of vision foundation models using metadata in scientific domains

Researchers propose FINO, a self-supervised method for adapting vision foundation models to specialized scientific domains without task labels, using metadata as a guidance signal instead. The approach combines a standard self-supervised objective with flexible handling of both discrete and continuous metadata to preserve informative factors while suppressing spurious ones. Evaluated across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO outperforms both unsupervised domain adaptation and fully supervised fine-tuning, including domain-specific state-of-the-art models.

Evaluation and Benchmarking FINO Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

5Hugging Face Blog·1mo ago·source ↗

Training and Finetuning Sparse Embedding Models with Sentence Transformers

Hugging Face published a tutorial on training and fine-tuning sparse embedding models using the Sentence Transformers library. Sparse embeddings offer an alternative to dense vector representations for retrieval tasks, potentially improving interpretability and efficiency. The post covers the tooling and workflows available in Sentence Transformers for producing sparse encoders suitable for search and RAG pipelines.

Inference Economics Agent and Tool Ecosystem Sparse Embedding Models Hugging Face Sentence Transformers

5Hugging Face Blog·1mo ago·source ↗

Can Foundation Models Label Data Like Humans?

This Hugging Face blog post examines whether foundation models can serve as substitutes for human annotators in RLHF data labeling pipelines. It investigates the reliability and quality of model-generated preference labels compared to human-generated ones, with implications for scalable oversight and alignment research. The analysis is framed around the Open LLM Leaderboard and RLHF methodology.

Evaluation and Benchmarking Alignment and RLHF Reinforcement Learning from Human Feedback Open LLM Leaderboard Hugging Face +1 more

3arXiv · cs.LG·6d ago·source ↗

Probing bioacoustic embeddings for speech-like acoustic features reveals no-free-lunch pattern

A new arXiv preprint investigates which acoustic features are encoded in pretrained bioacoustic audio embeddings using 88 eGeMAPS speech features across six taxonomic groups. Linear and nonlinear regression probes reveal that no single model captures the full acoustic feature space, with loudness best recovered (R²=0.76) and fundamental frequency hardest (R²=0.33). A concatenated embedding approach achieves highest overall performance, suggesting complementary coverage across models. The work provides data-driven model selection guidance for bioacoustics tasks involving rare species or low-resource domains.

Evaluation and Benchmarking eGeMAPS Beyond task performance: Decoding bioacoustic embeddings with speech features