Almanac
benchmark

Spearman's rho

benchmarkactiveprovisionalspearman-s-rho-186d85dc·2 events·first seen 22d ago

Aliases: Spearman's rho

Co-occurring entities

More like this (12)

Recent events (2)

5arXiv · cs.CL·21d ago·source ↗

GraphReview: Scientific Paper Evaluation via LLM-Based Graph Message Passing

GraphReview proposes a graph-based LLM framework that models scientific paper evaluation as review-signal message passing over a semantic paper graph, capturing both intrinsic quality and relational context (synchronic and diachronic links). LLMs estimate node-level quality priors and generate edge-level comparative evidence via pairwise comparisons, while Personalized PageRank integrates signals for ranking, decision prediction, and review generation. The system uses reward-induced maximum likelihood objectives to train LLM backbones and achieves average improvements of 29.7% over the strongest baseline on decision and ranking metrics, including 23.7% accuracy gain and 57.6% Spearman's ρ gain.

5arXiv · cs.CL·22d ago·source ↗

Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

This paper investigates multi-objective prompt optimization for LLM-as-judge systems, testing five decomposition modes of textual gradient optimizers across varying levels of cross-task information sharing. In 6 of 10 configurations, optimization fails to improve over the initial prompt, with gradient specificity dropping 59% when multiple criteria are processed jointly. The authors identify two separable failure modes: gradient dilution at optimization time and instruction interference at inference time. These findings constrain the design space for customizing LLM judges via textual feedback across multiple evaluation criteria simultaneously.