product

samyama-graph

productactiveprovisionalsamyama-graph-0f1bdd82·1 events·first seen 47h ago

Aliases: samyama-graph

Co-occurring entities

HealthBench MedQA OpenAI PrimeKG Knowledge-Graph Grounding Helps LLMs Only for Out-of-Training Knowledge: A Controlled Study on Clinical Question Answering GPT-5.5

More like this (12)

cognitive-graph LangGraph SAM 3D CodeGraph graphanything CLI UltraSAM memory evolution graph GraphCast code-review-graph dual-graph framework ASAM Chartographer

Recent events (1)

6arXiv · cs.CL·47h ago·source ↗

KG grounding helps LLMs only for out-of-training knowledge: controlled clinical QA study

A new arXiv paper investigates when knowledge-graph (KG) grounding improves LLM performance on clinical question answering, finding that structured KG retrieval over the public biomedical graph PrimeKG provides no meaningful improvement on MedQA (all deltas ≤3.4) because the relevant facts are already in the model's training data. On synthetic counterfactual and hybrid benchmarks containing genuinely novel facts, the same pipeline lifts out-of-training accuracy from chance to ~100%. The paper also reproduces and partially corrects a recent Nature Medicine study on frontier LLMs vs. clinical RAG tools, flagging a score-deflating grader bug and clarifying that the reported ~88 HealthBench score reflects the Consensus variant, not full HealthBench (~46-47). The core finding — that RAG/KG grounding pays off only when the decisive fact is outside the model's training distribution — has direct implications for when retrieval augmentation is worth deploying.

Evaluation and Benchmarking Enterprise Deployment Patterns HealthBench samyama-graph MedQA +5 more