PubMed
pubmed-f6437bd2·4 events·first seen 1mo agoAliases: PubMed
Co-occurring entities
More like this (12)
Recent events (4)
Anthropic Launches Claude for Life Sciences with New Connectors, Agent Skills, and Benchmark Improvements
Anthropic has announced a dedicated life sciences offering for Claude, targeting the full drug discovery and commercialization pipeline rather than individual tasks. Claude Sonnet 4.5 achieves 0.83 on the Protocol QA benchmark (above the human baseline of 0.79) and shows improvements on BioRench bioinformatics evaluations. The launch includes new connectors to platforms such as Benchling, BioRender, PubMed, Synapse.org, and 10x Genomics, plus a new Agent Skills framework starting with a single-cell RNA QC skill. Anthropic is partnering with major consultancies (Deloitte, Accenture, KPMG, PwC) and cloud providers (AWS, Google Cloud), with Sanofi cited as a flagship enterprise customer.
Anthropic Launches Claude for Healthcare and Expands Life Sciences Capabilities
Anthropic is expanding its healthcare and life sciences offerings with Claude for Healthcare, a HIPAA-ready product suite for providers, payers, and health tech companies, alongside new connectors to CMS databases, ICD-10, NPI Registry, and FHIR development tools. The announcement also highlights Claude Opus 4.5's improved performance on medical benchmarks including MedCalc and MedAgentBench, with extended thinking (64k tokens) and native tool use. New life sciences capabilities include connections to additional scientific platforms and support for clinical trial management and regulatory operations. The release positions Claude as an agentic research and administrative partner across healthcare workflows including prior authorization, claims appeals, and patient care coordination.
ChronoMedKG: Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning
ChronoMedKG is a new biomedical knowledge graph containing 460,497 evidence-linked triples across 13,431 diseases, each annotated with temporal components such as onset window and progression stage. It is constructed via a multi-agent pipeline using multiple frontier LLMs extracting from PubMed/PMC, with multi-model consensus and credibility filtering. The accompanying ChronoTQA benchmark (3,341 questions) reveals frontier LLMs lose ~30 points on temporal vs. static clinical questions, while ChronoMedKG-based retrieval recovers 47–65% of long-tail failures compared to 17–29% for HPOA-RAG. The work addresses a significant gap in existing KGs (PrimeKG, Hetionet, iKraph) that treat disease associations as static facts.
MetaSyn benchmark reveals critical screening bottleneck in LLM-based meta-analysis pipelines
Researchers introduce MetaSyn, a dataset of 442 expert-curated meta-analyses from Nature Portfolio journals, paired with a 140k-article PubMed retrieval corpus, PI/ECO criteria, verified positives, and hard negatives. Benchmarking twelve pipeline configurations — nine RAG variants and a protocol-driven agent — shows that despite 90.9% retrieval recall at K=200, no system recovers more than 52.7% of ground-truth included studies. The core failure is LLMs' inability to reliably distinguish eligible studies from topically similar but criteria-failing distractors. The paper argues that end-to-end scores obscure where pipelines break down and proposes stage-attributed metrics.