5AI Snake Oil·1mo ago

Does the UK's liver transplant matching algorithm systematically exclude younger patients?

This commentary examines whether the UK's liver transplant matching algorithm contains technical design choices that systematically disadvantage younger patients. The piece argues that seemingly minor algorithmic decisions can have life-or-death consequences in high-stakes medical AI systems. It falls within the broader discourse on algorithmic fairness and unintended bias in deployed AI/ML systems.

AI Safety Research Enterprise Deployment Patterns UK liver transplant matching algorithm NHS AI Snake Oil

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Related events (8)

7arXiv · cs.AI·24d ago·source ↗

Algorithmic Monocultures in Hiring: Racial Disparities and Homogeneous Rejection Patterns

A study of 3 million applicants and 4 million applications screened by algorithms from the same vendor finds significant racial disparities: 14.74% of Asian applicants and 25.87% of Black applicants submit to positions where the algorithm adversely impacts their group under U.S. employment discrimination standards. The paper also documents individual-level homogeneity, with 4% of applicants who apply to 10 positions receiving rejection recommendations from all of them—a rate above chance. The authors use deterministic replicability of hiring algorithms to simulate counterfactual outcomes, showing applicants would need to apply very broadly to receive human review.

Evaluation and Benchmarking AI Safety Research hiring screening algorithms algorithmic monoculture Algorithmic Monocultures in Hiring +3 more

6arXiv · cs.CL·12d ago·source ↗

LLM-guided MAP-Elites evolution improves medical decision pipelines at inference time

Researchers propose using LLM-guided MAP-Elites evolutionary search as an inference-time alternative to fine-tuning for adapting LLMs to clinical workflows, formulating triage, consultation, and image classification as evolutionary searches over executable artifacts. Across three medical settings, evolved programs substantially outperform manually designed baselines: triage accuracy improves from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, with gains also shown on MIMIC-ESI, iCRAFTMD, and PneumoniaMNIST. The approach works across Llama-3, Qwen-3.5, and Gemma-4 backbones and produces interpretable program-level mechanisms rather than superficial prompt changes.

Evaluation and Benchmarking Agent and Tool Ecosystem Gemma-4 E4B-it MIMIC-ESI iCRAFTMD +6 more

5Google Deepmind Blog·1mo ago·source ↗

Accelerating discovery of liver disease mechanisms with Co-Scientist

DeepMind's Co-Scientist AI system is being used by researcher Filippo Menolascina to identify new treatment mechanisms for liver disease and explain differential drug response across patients. The application demonstrates Co-Scientist's utility in biomedical hypothesis generation and drug discovery workflows. This represents a concrete scientific use case for AI-assisted research in a clinical domain.

Enterprise Deployment Patterns Agent and Tool Ecosystem Filippo Menolascina Co-Scientist liver disease +1 more

5Google Deepmind Blog·1mo ago·source ↗

Uncovering repurposed medicines to fight liver fibrosis using Co-Scientist

A Stanford geneticist used Google DeepMind's Co-Scientist AI system to identify potential drug repurposing candidates for chronic liver disease and liver fibrosis. The work represents a real-world application of AI-assisted scientific discovery in a clinical domain. Co-Scientist is DeepMind's AI research assistant designed to accelerate hypothesis generation and experimental planning for scientists.

Enterprise Deployment Patterns Agent and Tool Ecosystem drug repurposing liver fibrosis Co-Scientist +2 more

6arXiv · cs.CL·5d ago·source ↗

Computational audit finds ClinicalBERT amplifies demographic bias beyond training data distributions

Researchers present a systematic audit of representational bias in ClinicalBERT, a BERT-based model pretrained on MIMIC-III clinical discharge summaries, using two probing methodologies: Log Probability Bias Analysis and Masked Language Model probing across 98 clinical sentence templates and eight intersectional race-gender combinations. Of 32 statistically significant findings, 65.6% contradict observed corpus distributions, rising to 80% for Black patients and 87.5% for agency attribution under MLM probing. The key finding is that bias in ClinicalBERT operates predominantly through model-internal amplification rather than simple inheritance from training data, which has direct implications for clinical AI safety and deployment. This challenges the assumption that auditing training corpora is sufficient to characterize model bias.

Evaluation and Benchmarking AI Safety Research A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions MIMIC-III ClinicalBERT +1 more

6The Batch·1mo ago·source ↗

Two Studies Test Google's Breast Cancer Detection Models in Real-World Clinics

Two studies evaluated Google's mammography AI system—introduced in 2020 but not yet deployed for live patient care—against real-world UK NHS clinical workflows. In retrospective testing on 116,000 scans, the system achieved higher sensitivity (0.541 vs 0.437) than the first human reader while identifying 25% of cancers initially missed by doctors. A live integration test across 12 clinics showed the system processed scans in under 18 minutes versus over two days for human readers, with comparable accuracy, though some clinicians reported distrust of the system's outputs.

Evaluation and Benchmarking Enterprise Deployment Patterns Google iCAD Christopher J. Kelly +5 more

6Openai Blog·2d ago·source ↗

OpenAI reasoning model helps diagnose 18 previously unsolved rare childhood genetic diseases

Researchers used an OpenAI reasoning model to assist physicians in diagnosing rare genetic diseases in children, identifying 18 new diagnoses in cases that had previously gone unsolved. The announcement comes from OpenAI's official blog, positioning the work as a demonstration of reasoning model utility in high-stakes clinical settings. The result is notable as a concrete real-world application of frontier reasoning capabilities in medicine.

Frontier Model Releases Enterprise Deployment Patterns OpenAI Reasoning Models OpenAI

4The Batch·19d ago·source ↗

Blind Users Can Use AI Models As Virtual Mirrors, But Don't Always Like What They See

Blind and visually impaired users are increasingly relying on vision-language models (notably GPT-4 Vision via Be My Eyes) to assess their own appearance, gaining independence but also encountering AI outputs that reflect conventional beauty standards and may be factually inaccurate. A BBC article by blind journalist Milagros Costabel documents cases where AI feedback was psychologically harmful, including unsolicited critical commentary on facial features. Psychologists warn that blind users are especially vulnerable because they cannot independently verify AI visual judgments. The piece raises broader questions about accuracy, trust calibration, and empathy in AI products designed for accessibility.

AI Safety Research Enterprise Deployment Patterns Envision AI University of Bristol GPT-4 Vision +9 more