ESMFold2 matches AlphaFold 3 performance with open, LLM-inspired architecture for molecular structure prediction
Biohub and EvolutionaryScale released ESMFold2, a 6.2-billion-parameter open-weights model for predicting the 3D shapes of proteins, DNA, RNA, and small molecules by treating molecular sequences as language. Unlike AlphaFold 3, ESMFold2 can operate without multiple sequence alignments (MSAs) by using a transformer-based embedding model (ESMC) trained on 2.8 billion sequences, outperforming Chai-1 in MSA-free settings and matching AlphaFold 3 when MSAs are provided. The model weights are freely available on HuggingFace and via API through Biohub, making frontier-level structural biology accessible without proprietary infrastructure. The release is significant for drug discovery involving novel or synthetic molecules where MSA databases may be sparse.
Related guides (3)
Related events (8)
ESMFold2: The Bitter Lesson is Coming for Proteins — Alex Rives, BioHub
A Latent Space interview/commentary piece featuring Alex Rives of BioHub discussing ESMFold2 and the application of the 'bitter lesson' (scale and general methods beating hand-crafted inductive bias) to protein structure prediction and biology. The piece covers the tension between dataset scale versus domain-specific inductive bias in biological ML, and touches on world models and programmable biology. This represents a significant perspective from a leading researcher in protein language models on the next generation of biological foundation models.
PLAID: Repurposing Protein Folding Models for Multimodal Protein Generation with Latent Diffusion
PLAID is a generative model that simultaneously produces protein 1D sequences and 3D all-atom structures by learning a diffusion model over the latent space of ESMFold, a protein folding model. It requires only sequence data for training—leveraging databases 2-4 orders of magnitude larger than structure databases—and decodes structure at inference via frozen folding model weights. The approach supports compositional prompting by function and organism, addressing practical drug-design constraints like humanization and solubility. A companion compression model, CHEAP, addresses the high-dimensionality of transformer latent spaces to make the diffusion training tractable.
EvoStruct: Bridging Evolutionary and Structural Priors for Antibody CDR Design via Protein Language Model Adaptation
EvoStruct addresses vocabulary collapse in GNN-based antibody CDR design by combining a frozen protein language model with an E(3)-equivariant GNN through a cross-attention adapter. The method introduces progressive PLM unfreezing and R-Drop consistency regularization to recover functionally important amino acid diversity. On CHIMERA-Bench, EvoStruct improves sequence recovery by 16%, reduces perplexity by 43%, and achieves 2.3x greater amino acid diversity compared to the best GNN baselines.
Deep Learning with Proteins
A Hugging Face blog post covering the application of deep learning techniques to protein science, likely covering protein language models, structure prediction, and related tooling. Published in late 2022, this sits in the context of AlphaFold2's impact and the emerging ecosystem of protein ML models. The post likely surveys models, datasets, and frameworks available for computational biology on the Hugging Face platform.
AlphaFold Reveals Structure of Key Heart Disease Protein
DeepMind has used AlphaFold to determine the structure of a key protein implicated in heart disease. The announcement highlights a new scientific application of AlphaFold's protein structure prediction capabilities to cardiovascular research. This represents a continued expansion of AlphaFold's impact on biomedical discovery beyond its initial structural biology applications.
AlphaFold: Five Years of Impact
DeepMind published a retrospective on AlphaFold's five-year impact on biological research and scientific discovery. The post surveys how the protein structure prediction system has accelerated science globally since its initial release. As a tier-1 source anniversary piece, it likely highlights cumulative usage statistics, downstream research enabled, and future directions.
AlphaGenome: DeepMind's Unified DNA Sequence Model for Regulatory Variant-Effect Prediction
DeepMind has introduced AlphaGenome, a new unified DNA sequence model designed to advance regulatory variant-effect prediction and improve understanding of genome function. The model is now available via API, making it accessible to researchers. AlphaGenome represents a significant step in applying large-scale AI to genomics, particularly for interpreting non-coding regulatory regions of the genome.
Google's AlphaGenome Interprets Non-Coding DNA That Regulates Genetic Expression
Google has released AlphaGenome, an open-weights model that interprets the ~98% of human and mouse genomes that regulate gene expression rather than coding for proteins. The model takes up to 1 million DNA base pairs as input and outputs roughly 6,000 human and 1,000 mouse gene properties, using a CNN-transformer-CNN architecture trained via ensemble distillation from 64 pretrained models. Across 50 evaluations, AlphaGenome matched or exceeded prior models in 47 cases, and correctly predicted expression changes associated with T-cell acute lymphoblastic leukemia. Weights, API, and inference code are freely available for noncommercial use.


