Almanac
model

BioLlama3

modelactiveprovisionalbiollama3-0c24c1ed·1 events·first seen 9d ago

Aliases: BioLlama3

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·9d ago·source ↗

Systematic evaluation of LLM prompt sensitivity in healthcare settings reveals safety risks

Researchers conduct a sensitivity analysis of both general-purpose and medical-specific LLMs using the MedMCQA benchmark, testing robustness to lexical and syntactic prompt perturbations. The study finds that even minor phrasing changes can alter clinical advice, and adversarial prompts can produce dangerous outputs such as incorrect dosages or omitted critical findings. Both general-purpose models (GPT-3.5, Llama 3) and domain-specific models (ClinicalBERT, BioLlama3, BioBERT) exhibit this fragility, with syntactic reordering and misleading contextual cues proving more destabilizing than simple paraphrasing.