Large Language Models (frontier)
large-language-models-frontier--57cd7202·2 events·first seen 29d agoAliases: Large Language Models (frontier), frontier language models
Co-occurring entities
More like this (12)
Recent events (2)
Auditing Value Pluralism in Clinical Ethics of Large Language Models
Researchers present a framework for auditing ethical value pluralism in medical AI, comprising a benchmark of clinician-verified dilemmas and an attribution method that recovers value priorities from model decisions. While frontier LLMs span physician-level value heterogeneity in aggregate and discuss competing values in reasoning, individual model decisions are near-deterministic and fail to reproduce the distributional pluralism of physician panels. Some models systematically underweight patient autonomy. The authors warn that deploying a single LLM at scale risks replacing clinical pluralism with a 'deployment monoculture.'
DeepWeb-Bench: A Hard Deep Research Benchmark Requiring Cross-Source Evidence and Long-Horizon Derivation
DeepWeb-Bench is a new benchmark designed to stress-test frontier language models on deep research tasks—open-web search, evidence collection, and multi-step derivation—where existing benchmarks have become saturated. The benchmark evaluates nine frontier models across four capability families (Retrieval, Derivation, Reasoning, Calibration) and finds that retrieval is not the primary bottleneck; derivation and calibration failures account for over 70% of errors. Strong models fail via incomplete derivation while weak models fail via hallucinated precision, and models show genuine domain specialization with low cross-model agreement (rho = 0.61). The benchmark, rubrics, and evaluation code are publicly released.