Almanac
benchmark

AUROC

benchmarkactiveauroc-439da753·3 events·first seen 28d ago

Aliases: AUROC

Co-occurring entities

More like this (12)

Recent events (3)

6arXiv · cs.CL·28d ago·source ↗

Probe Trajectories Reveal Reasoning Dynamics in Large Reasoning Models

This paper investigates whether hidden representations of Large Reasoning Models (LRMs) can predict future model behavior by analyzing probe trajectories—the continuous evolution of concept probabilities across Chain-of-Thought reasoning tokens. The authors find that temporal trajectory features (volatility, trend, steady-state) significantly outperform single static probes, with max-pooling achieving up to 95% AUROC across safety and mathematics domains. Two methodological insights are offered: template-based training data matches dynamically generated responses in quality, and pooling strategy is critical to probe performance. The work positions probe trajectories as a complementary safety monitoring framework for LRMs where CoT faithfulness cannot be assumed.

6arXiv · cs.CL·20d ago·source ↗

Activation Steering for Synthetic Safety Data Generation: Diversity as a Critical Quality Axis

This paper investigates whether activation steering (AS) can generate high-quality synthetic training data for downstream safety detection classifiers, filling a gap in the literature. Across 4 safety concepts × 2 models × 4 steering methods, the authors find that AS-generated data outperforms prompt-generated data on 3 of 4 concepts, but only 41 of 136 configurations succeed, indicating a narrow effective regime. The study introduces sample- and set-level diversity as a previously absent quality axis, finding that higher steering strength reduces diversity and that the harmonic mean of success, coherence, and diversity correlates more reliably with downstream AUROC than prior metrics alone. The results provide a practical heuristic for practitioners tuning AS hyperparameters for safety data generation.

5arXiv · cs.AI·20d ago·source ↗

BIRDNet: Interpretable Neural Networks via Boolean Implication Knowledge Graphs for Tabular Data

BIRDNet is a neurosymbolic architecture that mines Boolean implication relationships (BIRs) from tabular data using a sparse-exception binomial test, then encodes the resulting directed graph as the connectivity structure of a layered neural network. Each hidden unit corresponds to exactly one mined rule and binds only to its two features, yielding up to 96× parameter reduction versus a matched dense MLP. Evaluated on six transcriptomic and proteomic benchmarks, BIRDNet stays within 0.02 AUROC of dense baselines while recovering known biological signatures such as canonical amplicons and immune-infiltration markers. Unlike most neurosymbolic approaches, BIRDNet derives its structural prior from data rather than an external rule base.