dataset
DiverseRNA-1.4M
datasetactiveprovisional
diverserna-1-4m-e1710e7c·1 events·first seen 16d agoAliases: DiverseRNA-1.4M
Co-occurring entities
More like this (12)
Recent events (1)
TxFM: Masked Autoencoding Foundation Model for RNA-seq Gene Expression Representation
The paper introduces TxFM, a self-supervised masked autoencoder model for transcriptomic (RNA-seq) data representation learning, trained on a curated 1.4M-sample corpus called DiverseRNA-1.4M. TxFM outperforms existing transcriptomic foundation models trained on datasets over 100x larger, addressing the known problem of deep models underperforming linear baselines on gene expression data. The work provides ablation studies identifying critical architecture choices and argues that careful data curation combined with inductive self-supervised learning is sufficient for strong transfer performance in transcriptomics.