Atlas H&E-TME: AI system matches expert pathologist accuracy for scalable tumor microenvironment profiling
Researchers present Atlas H&E-TME, an AI system built on the Atlas family of pathology foundation models that generates over 4,500 quantitative readouts per whole-slide H&E image at cell-level resolution across multiple cancer types. The system is validated using a novel dual framework: an IHC-informed multi-pathologist consensus protocol for depth, and benchmarking against 200,000+ annotations across 1,500+ cases from 25+ sources spanning eight cancer types. Atlas H&E-TME matches or exceeds pathologist H&E-only performance, demonstrating that standard histopathology slides can serve as a scalable quantitative window into the tumor microenvironment. The work advances computational pathology by enabling tissue-based biomarker discovery without requiring specialized staining modalities.
Related guides (1)
Related events (8)
Two Studies Test Google's Breast Cancer Detection Models in Real-World Clinics
Two studies evaluated Google's mammography AI system—introduced in 2020 but not yet deployed for live patient care—against real-world UK NHS clinical workflows. In retrospective testing on 116,000 scans, the system achieved higher sensitivity (0.541 vs 0.437) than the first human reader while identifying 25% of cancers initially missed by doctors. A live integration test across 12 clinics showed the system processed scans in under 18 minutes versus over two days for human readers, with comparable accuracy, though some clinicians reported distrust of the system's outputs.
Orakl Oncology uses Meta's DINOv2 to accelerate cancer organoid analysis and drug response prediction
Orakl Oncology, a spinoff from the Gustave Roussy Institute, has deployed Meta's open-source DINOv2 vision model to analyze cancer organoid images and predict patient drug responses in clinical trials. In collaboration with CentraleSupelec and the Jaulin Lab under the RHU ORGANOMIC initiative, the team found DINOv2 outperformed prior specialized models by 26.8% accuracy. The model enabled quantitative extraction of imaging data from organoid videos, replacing labor-intensive frame-by-frame analysis and significantly accelerating their biomedical platform development.
LLM-guided MAP-Elites evolution improves medical decision pipelines at inference time
Researchers propose using LLM-guided MAP-Elites evolutionary search as an inference-time alternative to fine-tuning for adapting LLMs to clinical workflows, formulating triage, consultation, and image classification as evolutionary searches over executable artifacts. Across three medical settings, evolved programs substantially outperform manually designed baselines: triage accuracy improves from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, with gains also shown on MIMIC-ESI, iCRAFTMD, and PneumoniaMNIST. The approach works across Llama-3, Qwen-3.5, and Gemma-4 backbones and produces interpretable program-level mechanisms rather than superficial prompt changes.
ATLAS: Active learning framework for automated discovery of interpretable behavioral models in cognitive science
ATLAS (Active Theory Learning for Automated Science) is a new active learning framework that iterates between generating mechanistic hypotheses as sparse neural network ensembles and designing maximally informative experiments to distinguish between them. The system is tested on recovering reinforcement learning agents from behavioral data in bandit tasks, achieving 5-10x sample efficiency improvements over random experimentation and matching expert-designed experiments from the literature. The work targets automated scientific discovery in cognitive science, with potential generalization to other domains requiring mechanistic modeling.
Color Health's Cancer Copilot Uses GPT-4o for Oncology Workup Planning
Color Health has partnered with OpenAI to deploy GPT-4o in a clinical application called Cancer Copilot, designed to identify missing diagnostics and generate tailored cancer workup plans. The system aims to accelerate patient access to cancer screening and treatment by supporting evidence-based clinical decision-making. This represents a concrete enterprise deployment of GPT-4o in a high-stakes medical context.
Measuring AI's capability to accelerate biological research
OpenAI introduces a real-world evaluation framework designed to measure how AI systems can accelerate biological research in wet lab settings. The work uses GPT-5 to optimize a molecular cloning protocol as a concrete demonstration case. The framework explicitly addresses both the potential benefits and biosecurity risks of AI-assisted experimentation, positioning this as a dual-use capability assessment.
MetaSyn benchmark reveals critical screening bottleneck in LLM-based meta-analysis pipelines
Researchers introduce MetaSyn, a dataset of 442 expert-curated meta-analyses from Nature Portfolio journals, paired with a 140k-article PubMed retrieval corpus, PI/ECO criteria, verified positives, and hard negatives. Benchmarking twelve pipeline configurations — nine RAG variants and a protocol-driven agent — shows that despite 90.9% retrieval recall at K=200, no system recovers more than 52.7% of ground-truth included studies. The core failure is LLMs' inability to reliably distinguish eligible studies from topically similar but criteria-failing distractors. The paper argues that end-to-end scores obscure where pipelines break down and proposes stage-attributed metrics.
AutoForest: End-to-End LLM System for Automated Forest Plot Generation from Biomedical Studies
AutoForest is presented as the first end-to-end system that generates publication-ready forest plots directly from biomedical papers using large language models. The system automatically suggests ICO (Intervention, Comparator, Outcome) elements, extracts outcome data, performs statistical synthesis, and renders forest plots without manual intervention. A user study with clinicians demonstrates its effectiveness on real-world examples, aiming to accelerate systematic review and meta-analysis workflows.
