Entity · product

PubMed

productactivepubmed-f6437bd2·6 events·first seen May 18, 2026

Aliases: PubMed

Co-occurring entities

More like this (12)

PubMed Central PubMedQA Wikipedia ArXiv MedCalc Reddit Google MuseNet OpenMed Meta ImageNet Wikidata

Recent events (6)

4arXiv · cs.CL·Jul 16, 2026·source ↗

Cost-Pragmatic Retrieval Gating and Multi-Model Fusion for BioASQ 2026 Biomedical QA

A BioASQ Task 14B 2026 system paper describes two core design decisions: a cost-pragmatic re-retrieval policy using a BGE cross-encoder quality gate, and a decomposition of multi-model ensemble lift into selection and fusion components. The retrieval pipeline unions dense (BGE + BM25 + RRF) and agent-driven PubMed/Europe PMC/iCite pipelines, achieving R@200 = 99.3% on the BioASQ-13b archive. The team places first on the combined-exact aggregate on three of eight leaderboard tracks and first on Phase B b3 ideal. GPT-5.5 solo retains list-F1 lead over a synonym-union resolver due to precision trade-offs.

Evaluation and Benchmarking Agent and Tool Ecosystem PubMed BGE BM25 +4 more

4arXiv · cs.AI·Jun 30, 2026·source ↗

PromptGNN-sim: Bidirectional GNN-LLM fusion framework for text-attributed graph learning

Researchers introduce PromptGNN-sim, a bidirectional structure-semantic fusion framework that jointly trains a Graph Attention Network and an LLM for text-attributed graph learning. The system uses GAT-based neighborhood selection to generate structure-aware prompts for the LLM, with cross-modal contrastive learning and cross-attention aligning both components during training. Evaluated on six datasets including Cora, Pubmed, and WikiCS, it outperforms classical GNNs, standalone LLMs, and prior GNN-LLM fusion methods on cross-task transfer, cross-dataset generalization, and sparse perturbation settings.

Multimodal Progress CORA PubMed WikiCS +2 more

5arXiv · cs.CL·Jun 16, 2026·source ↗

MetaSyn benchmark reveals critical screening bottleneck in LLM-based meta-analysis pipelines

Researchers introduce MetaSyn, a dataset of 442 expert-curated meta-analyses from Nature Portfolio journals, paired with a 140k-article PubMed retrieval corpus, PI/ECO criteria, verified positives, and hard negatives. Benchmarking twelve pipeline configurations — nine RAG variants and a protocol-driven agent — shows that despite 90.9% retrieval recall at K=200, no system recovers more than 52.7% of ground-truth included studies. The core failure is LLMs' inability to reliably distinguish eligible studies from topically similar but criteria-failing distractors. The paper argues that end-to-end scores obscure where pipelines break down and proposes stage-attributed metrics.

Evaluation and Benchmarking Agent and Tool Ecosystem PubMed Nature Portfolio MetaSyn

7Anthropic News·Jun 1, 2026·source ↗

Anthropic Launches Claude for Life Sciences with New Connectors, Agent Skills, and Benchmark Improvements

Anthropic has announced a dedicated life sciences offering for Claude, targeting the full drug discovery and commercialization pipeline rather than individual tasks. Claude Sonnet 4.5 achieves 0.83 on the Protocol QA benchmark (above the human baseline of 0.79) and shows improvements on BioRench bioinformatics evaluations. The launch includes new connectors to platforms such as Benchling, BioRender, PubMed, Synapse.org, and 10x Genomics, plus a new Agent Skills framework starting with a single-cell RNA QC skill. Anthropic is partnering with major consultancies (Deloitte, Accenture, KPMG, PwC) and cloud providers (AWS, Google Cloud), with Sanofi cited as a flagship enterprise customer.

Frontier Model Releases Evaluation and Benchmarking Google Cloud AWS PubMed +15 more

6arXiv · cs.CL·May 22, 2026·source ↗

ChronoMedKG: Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning

ChronoMedKG is a new biomedical knowledge graph containing 460,497 evidence-linked triples across 13,431 diseases, each annotated with temporal components such as onset window and progression stage. It is constructed via a multi-agent pipeline using multiple frontier LLMs extracting from PubMed/PMC, with multi-model consensus and credibility filtering. The accompanying ChronoTQA benchmark (3,341 questions) reveals frontier LLMs lose ~30 points on temporal vs. static clinical questions, while ChronoMedKG-based retrieval recovers 47–65% of long-tail failures compared to 17–29% for HPOA-RAG. The work addresses a significant gap in existing KGs (PrimeKG, Hetionet, iKraph) that treat disease associations as static facts.

Evaluation and Benchmarking Enterprise Deployment Patterns Phenopackets PubMed ChronoTQA +8 more

7Anthropic News·May 18, 2026·source ↗

Anthropic Launches Claude for Healthcare and Expands Life Sciences Capabilities

Anthropic is expanding its healthcare and life sciences offerings with Claude for Healthcare, a HIPAA-ready product suite for providers, payers, and health tech companies, alongside new connectors to CMS databases, ICD-10, NPI Registry, and FHIR development tools. The announcement also highlights Claude Opus 4.5's improved performance on medical benchmarks including MedCalc and MedAgentBench, with extended thinking (64k tokens) and native tool use. New life sciences capabilities include connections to additional scientific platforms and support for clinical trial management and regulatory operations. The release positions Claude as an agentic research and administrative partner across healthcare workflows including prior authorization, claims appeals, and patient care coordination.

Frontier Model Releases Evaluation and Benchmarking PubMed MedAgentBench Claude Opus 4.6 +12 more