Entity · benchmark

AUROC

benchmarkactiveauroc-439da753·4 events·first seen May 19, 2026

Aliases: AUROC

Co-occurring entities

AUPRC When Does Synthetic Data Augmentation Improve Score-Based Imbalanced Classification?Safety Detection Classifier HHH (Helpful, Harmless, Honest)Activation Steering Synthetic Data Generator MAHI-Group multilayer perceptron (MLP)sparse-exception binomial test BIRDNet Boolean Implication Relationships (BIRs)Max-Pooling Chain-of-Thought Reasoning Probe Trajectories Large Reasoning Models

More like this (12)

AUC AuRA AUPRC UAR (Unforeseen Attack Robustness)A2C ARC-AGI ACROS ROCm AI Reproducibility Benchmark OFA A3C AB-UPT

Recent events (4)

4arXiv · cs.LG·Jun 25, 2026·source ↗

Theoretical framework for when synthetic data augmentation improves imbalanced classification metrics

A new arXiv preprint develops a theoretical framework characterizing when synthetic minority-class augmentation improves score-based metrics (AUROC, AUPRC, balanced accuracy, F1) under class imbalance. The authors show that under well-specified score models, augmentation provides no fundamental population-level improvement and may introduce bias, while under model misspecification it can correct ranking errors by shifting effective class balance. Minimax lower bounds confirm the raw estimator is already optimal in the well-specified regime, and simulation studies corroborate the theory.

Evaluation and Benchmarking AUPRC AUROC When Does Synthetic Data Augmentation Improve Score-Based Imbalanced Classification?

6arXiv · cs.CL·May 28, 2026·source ↗

Activation Steering for Synthetic Safety Data Generation: Diversity as a Critical Quality Axis

This paper investigates whether activation steering (AS) can generate high-quality synthetic training data for downstream safety detection classifiers, filling a gap in the literature. Across 4 safety concepts × 2 models × 4 steering methods, the authors find that AS-generated data outperforms prompt-generated data on 3 of 4 concepts, but only 41 of 136 configurations succeed, indicating a narrow effective regime. The study introduces sample- and set-level diversity as a previously absent quality axis, finding that higher steering strength reduces diversity and that the harmonic mean of success, coherence, and diversity correlates more reliably with downstream AUROC than prior metrics alone. The results provide a practical heuristic for practitioners tuning AS hyperparameters for safety data generation.

Evaluation and Benchmarking AI Safety Research Safety Detection Classifier HHH (Helpful, Harmless, Honest)Activation Steering +3 more

5arXiv · cs.AI·May 28, 2026·source ↗

BIRDNet: Interpretable Neural Networks via Boolean Implication Knowledge Graphs for Tabular Data

BIRDNet is a neurosymbolic architecture that mines Boolean implication relationships (BIRs) from tabular data using a sparse-exception binomial test, then encodes the resulting directed graph as the connectivity structure of a layered neural network. Each hidden unit corresponds to exactly one mined rule and binds only to its two features, yielding up to 96× parameter reduction versus a matched dense MLP. Evaluated on six transcriptomic and proteomic benchmarks, BIRDNet stays within 0.02 AUROC of dense baselines while recovering known biological signatures such as canonical amplicons and immune-infiltration markers. Unlike most neurosymbolic approaches, BIRDNet derives its structural prior from data rather than an external rule base.

Evaluation and Benchmarking AI Safety Research MAHI-Group multilayer perceptron (MLP)sparse-exception binomial test +3 more

6arXiv · cs.CL·May 19, 2026·source ↗

Probe Trajectories Reveal Reasoning Dynamics in Large Reasoning Models

This paper investigates whether hidden representations of Large Reasoning Models (LRMs) can predict future model behavior by analyzing probe trajectories—the continuous evolution of concept probabilities across Chain-of-Thought reasoning tokens. The authors find that temporal trajectory features (volatility, trend, steady-state) significantly outperform single static probes, with max-pooling achieving up to 95% AUROC across safety and mathematics domains. Two methodological insights are offered: template-based training data matches dynamically generated responses in quality, and pooling strategy is critical to probe performance. The work positions probe trajectories as a complementary safety monitoring framework for LRMs where CoT faithfulness cannot be assumed.

Frontier Model Releases Evaluation and Benchmarking Max-Pooling Chain-of-Thought Reasoning Probe Trajectories +4 more