Entity · dataset

MedMCQA

datasetactivemedmcqa-c38ab92c·2 events·first seen May 19, 2026

Aliases: MedMCQA

Co-occurring entities

BioLlama3 BioBERT ClinicalBERT GPT-3.5 Llama 3 PubMedQA Open Medical-LLM Leaderboard MedQA Hugging Face

More like this (12)

MedQA PubMedQA MedCalc MedQADE QIMMA MMAC MMMC-Code MATH-MCQA MedCPT MMMU ThReadMed-QA GPQA

Recent events (2)

5arXiv · cs.CL·Jun 8, 2026·source ↗

Systematic evaluation of LLM prompt sensitivity in healthcare settings reveals safety risks

Researchers conduct a sensitivity analysis of both general-purpose and medical-specific LLMs using the MedMCQA benchmark, testing robustness to lexical and syntactic prompt perturbations. The study finds that even minor phrasing changes can alter clinical advice, and adversarial prompts can produce dangerous outputs such as incorrect dosages or omitted critical findings. Both general-purpose models (GPT-3.5, Llama 3) and domain-specific models (ClinicalBERT, BioLlama3, BioBERT) exhibit this fragility, with syntactic reordering and misleading contextual cues proving more destabilizing than simple paraphrasing.

Evaluation and Benchmarking AI Safety Research BioLlama3 BioBERT MedMCQA +3 more

5Hugging Face Blog·May 19, 2026·source ↗

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Hugging Face has launched the Open Medical-LLM Leaderboard, a public benchmark for evaluating large language models on healthcare and medical tasks. The leaderboard aggregates performance across multiple medical question-answering datasets to enable standardized comparison of open-weight models in clinical and biomedical domains. This initiative aims to accelerate progress in medical AI by providing transparent, reproducible evaluation infrastructure.

Evaluation and Benchmarking Open Weights Progress PubMedQA Open Medical-LLM Leaderboard MedMCQA +3 more