5Hugging Face Blog·1mo ago

Training mRNA Language Models Across 25 Species for $165

A Hugging Face blog post describes training mRNA language models spanning 25 biological species at a total compute cost of $165. The work demonstrates that biological sequence language models can be trained at extremely low cost, potentially democratizing genomic/transcriptomic AI research. The post likely covers model architecture, training data, and cross-species generalization results.

Training Infrastructure Open Weights Progress mRNA Language Model OpenMed Hugging Face

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Deep Learning over the Internet: Training Language Models Collaboratively

This Hugging Face blog post describes a framework for training large language models collaboratively across volunteer compute contributed over the internet. The approach addresses the challenge of enabling distributed participants with heterogeneous hardware to jointly train models without centralized infrastructure. It represents an early exploration of decentralized training as an alternative to large-scale private compute clusters.

Training Infrastructure Open Weights Progress collaborative distributed training Hugging Face volunteer compute

4Hugging Face Blog·1mo ago·source ↗

Deep Learning with Proteins

A Hugging Face blog post covering the application of deep learning techniques to protein science, likely covering protein language models, structure prediction, and related tooling. Published in late 2022, this sits in the context of AlphaFold2's impact and the emerging ecosystem of protein ML models. The post likely surveys models, datasets, and frameworks available for computational biology on the Hugging Face platform.

Open Weights Progress Agent and Tool Ecosystem protein language models AlphaFold2 ESM (Evolutionary Scale Modeling)+1 more

8Openai Blog·1mo ago·source ↗

GPT-5 lowers the cost of cell-free protein synthesis

An autonomous laboratory system integrating OpenAI's GPT-5 with Ginkgo Bioworks' cloud automation platform achieved a 40% reduction in cell-free protein synthesis costs. The system operates via closed-loop experimentation, where the AI model iteratively designs, executes, and refines biological experiments without human intervention. This represents a concrete application of frontier LLMs to wet-lab automation and cost optimization in synthetic biology.

Frontier Model Releases Enterprise Deployment Patterns cell-free protein synthesis closed-loop experimentation Ginkgo Bioworks +3 more

7Google Deepmind Blog·1mo ago·source ↗

AlphaGenome: DeepMind's Unified DNA Sequence Model for Regulatory Variant-Effect Prediction

DeepMind has introduced AlphaGenome, a new unified DNA sequence model designed to advance regulatory variant-effect prediction and improve understanding of genome function. The model is now available via API, making it accessible to researchers. AlphaGenome represents a significant step in applying large-scale AI to genomics, particularly for interpreting non-coding regulatory regions of the genome.

Frontier Model Releases Multimodal Progress regulatory variant-effect prediction AlphaGenome Google DeepMind

7The Batch·19d ago·source ↗

Google's AlphaGenome Interprets Non-Coding DNA That Regulates Genetic Expression

Google has released AlphaGenome, an open-weights model that interprets the ~98% of human and mouse genomes that regulate gene expression rather than coding for proteins. The model takes up to 1 million DNA base pairs as input and outputs roughly 6,000 human and 1,000 mouse gene properties, using a CNN-transformer-CNN architecture trained via ensemble distillation from 64 pretrained models. Across 50 evaluations, AlphaGenome matched or exceeded prior models in 47 cases, and correctly predicted expression changes associated with T-cell acute lymphoblastic leukemia. Weights, API, and inference code are freely available for noncommercial use.

Open Weights Progress Multimodal Progress Transformers ensemble distillation Google +3 more

4Hugging Face Blog·1mo ago·source ↗

Train AI Models with Unsloth and Hugging Face Jobs for Free

Hugging Face has published a blog post describing how to use Unsloth in combination with Hugging Face Jobs to fine-tune AI models at no cost. The post targets practitioners looking for accessible, low-cost training workflows. It highlights the integration between Unsloth's memory-efficient training optimizations and Hugging Face's job execution infrastructure.

Open Weights Progress Inference Economics Unsloth Hugging Face Jobs Hugging Face +1 more

8Hugging Face Blog·1mo ago·source ↗

Introducing BLOOM: The World's Largest Open Multilingual Language Model

Hugging Face and the BigScience workshop released BLOOM, a 176-billion parameter open-access multilingual language model trained on 46 natural languages and 13 programming languages. The model was developed collaboratively by over 1,000 researchers and represents a significant milestone in open-weights large language model development. BLOOM was designed to be freely accessible to researchers and practitioners, in contrast to proprietary models of similar scale.

Frontier Model Releases Open Weights Progress BLOOM Hugging Face BigScience +1 more

6Hugging Face Blog·1mo ago·source ↗

The Technology Behind BLOOM Training

This Hugging Face blog post details the infrastructure and training methodology used to train BLOOM, a 176-billion parameter open-access multilingual language model. It covers the use of Megatron-DeepSpeed for distributed training across hundreds of GPUs, including tensor parallelism, pipeline parallelism, and data parallelism strategies. The post also discusses hardware setup, memory optimization techniques, and lessons learned during the large-scale training run.

Training Infrastructure Open Weights Progress BLOOM DeepSpeed Hugging Face +2 more