Almanac
← Events
5AI Snake Oil·1mo ago

Does the UK's liver transplant matching algorithm systematically exclude younger patients?

This commentary examines whether the UK's liver transplant matching algorithm contains technical design choices that systematically disadvantage younger patients. The piece argues that seemingly minor algorithmic decisions can have life-or-death consequences in high-stakes medical AI systems. It falls within the broader discourse on algorithmic fairness and unintended bias in deployed AI/ML systems.

Related guides (2)

Related events (8)

7arXiv · cs.AI·24d ago·source ↗

Algorithmic Monocultures in Hiring: Racial Disparities and Homogeneous Rejection Patterns

A study of 3 million applicants and 4 million applications screened by algorithms from the same vendor finds significant racial disparities: 14.74% of Asian applicants and 25.87% of Black applicants submit to positions where the algorithm adversely impacts their group under U.S. employment discrimination standards. The paper also documents individual-level homogeneity, with 4% of applicants who apply to 10 positions receiving rejection recommendations from all of them—a rate above chance. The authors use deterministic replicability of hiring algorithms to simulate counterfactual outcomes, showing applicants would need to apply very broadly to receive human review.

6arXiv · cs.CL·12d ago·source ↗

LLM-guided MAP-Elites evolution improves medical decision pipelines at inference time

Researchers propose using LLM-guided MAP-Elites evolutionary search as an inference-time alternative to fine-tuning for adapting LLMs to clinical workflows, formulating triage, consultation, and image classification as evolutionary searches over executable artifacts. Across three medical settings, evolved programs substantially outperform manually designed baselines: triage accuracy improves from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, with gains also shown on MIMIC-ESI, iCRAFTMD, and PneumoniaMNIST. The approach works across Llama-3, Qwen-3.5, and Gemma-4 backbones and produces interpretable program-level mechanisms rather than superficial prompt changes.

5Google Deepmind Blog·1mo ago·source ↗

Accelerating discovery of liver disease mechanisms with Co-Scientist

DeepMind's Co-Scientist AI system is being used by researcher Filippo Menolascina to identify new treatment mechanisms for liver disease and explain differential drug response across patients. The application demonstrates Co-Scientist's utility in biomedical hypothesis generation and drug discovery workflows. This represents a concrete scientific use case for AI-assisted research in a clinical domain.

5Google Deepmind Blog·1mo ago·source ↗

Uncovering repurposed medicines to fight liver fibrosis using Co-Scientist

A Stanford geneticist used Google DeepMind's Co-Scientist AI system to identify potential drug repurposing candidates for chronic liver disease and liver fibrosis. The work represents a real-world application of AI-assisted scientific discovery in a clinical domain. Co-Scientist is DeepMind's AI research assistant designed to accelerate hypothesis generation and experimental planning for scientists.

6arXiv · cs.CL·5d ago·source ↗

Computational audit finds ClinicalBERT amplifies demographic bias beyond training data distributions

Researchers present a systematic audit of representational bias in ClinicalBERT, a BERT-based model pretrained on MIMIC-III clinical discharge summaries, using two probing methodologies: Log Probability Bias Analysis and Masked Language Model probing across 98 clinical sentence templates and eight intersectional race-gender combinations. Of 32 statistically significant findings, 65.6% contradict observed corpus distributions, rising to 80% for Black patients and 87.5% for agency attribution under MLM probing. The key finding is that bias in ClinicalBERT operates predominantly through model-internal amplification rather than simple inheritance from training data, which has direct implications for clinical AI safety and deployment. This challenges the assumption that auditing training corpora is sufficient to characterize model bias.

6The Batch·1mo ago·source ↗

Two Studies Test Google's Breast Cancer Detection Models in Real-World Clinics

Two studies evaluated Google's mammography AI system—introduced in 2020 but not yet deployed for live patient care—against real-world UK NHS clinical workflows. In retrospective testing on 116,000 scans, the system achieved higher sensitivity (0.541 vs 0.437) than the first human reader while identifying 25% of cancers initially missed by doctors. A live integration test across 12 clinics showed the system processed scans in under 18 minutes versus over two days for human readers, with comparable accuracy, though some clinicians reported distrust of the system's outputs.

6Openai Blog·2d ago·source ↗

OpenAI reasoning model helps diagnose 18 previously unsolved rare childhood genetic diseases

Researchers used an OpenAI reasoning model to assist physicians in diagnosing rare genetic diseases in children, identifying 18 new diagnoses in cases that had previously gone unsolved. The announcement comes from OpenAI's official blog, positioning the work as a demonstration of reasoning model utility in high-stakes clinical settings. The result is notable as a concrete real-world application of frontier reasoning capabilities in medicine.

4The Batch·19d ago·source ↗

Blind Users Can Use AI Models As Virtual Mirrors, But Don't Always Like What They See

Blind and visually impaired users are increasingly relying on vision-language models (notably GPT-4 Vision via Be My Eyes) to assess their own appearance, gaining independence but also encountering AI outputs that reflect conventional beauty standards and may be factually inaccurate. A BBC article by blind journalist Milagros Costabel documents cases where AI feedback was psychologically harmful, including unsolicited critical commentary on facial features. Psychologists warn that blind users are especially vulnerable because they cannot independently verify AI visual judgments. The piece raises broader questions about accuracy, trust calibration, and empathy in AI products designed for accessibility.