4arXiv cs.AI (Artificial Intelligence)·2d ago

Neural Classification Trees (NCT) discover latent subgroups for robust classification without group supervision

A new arXiv preprint introduces Neural Classification Trees (NCT), a framework that encodes subgroup structure in a tree-shaped architecture to address spurious correlations in ML classifiers. By routing samples to 'easy' or 'hard' nodes based on prediction correctness and reusing routes as pseudo-labels iteratively, NCT disentangles conflicting subgroups without requiring subgroup annotations. The method is evaluated on five benchmarks covering binary and multi-class spurious correlations, achieving competitive robustness while providing interpretable mappings between model architecture and latent data structure.

Evaluation and Benchmarking AI Safety Research Neural Classification Trees

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

3arXiv · cs.AI·16d ago·source ↗

Label Context Classifier (LCC) improves GNN node classification on heterophilous graphs

A new arXiv preprint proposes the Label Context Classifier (LCC), a method for improving node classification in graph neural networks on heterophilous graphs where connected nodes tend to have different class labels. LCC generates label context embeddings via four types of directed walks to capture higher-order class label connectivity, and can be integrated with any existing GNN architecture. Experiments show GNNs augmented with LCC outperform state-of-the-art methods on heterophilous directed graphs.

Graph Neural Network leveraging Higher-order Class Label Connectivity for Heterophilous Graphs Label Context Classifier

6arXiv · cs.CL·19d ago·source ↗

NF-CoT: Latent reasoning with normalizing flows preserves autoregressive LLM advantages

Researchers propose NF-CoT, a latent reasoning framework that replaces discrete chain-of-thought token streams with continuous intermediate states modeled by normalizing flows embedded inside an LLM backbone. The approach uses a TARFlow-style normalizing flow head alongside the standard language model head, enabling exact likelihoods, KV-cache-compatible left-to-right decoding, and policy-gradient optimization in latent space. On code-generation benchmarks, NF-CoT improves pass rates over both explicit CoT and prior latent-reasoning baselines while reducing intermediate reasoning cost. The work addresses a key limitation of existing latent reasoning methods, which typically sacrifice probabilistic tractability or autoregressive compatibility.

Inference Economics Alignment and RLHF TARFlow NF-CoT Latent Reasoning with Normalizing Flows

7Anthropic News·22d ago·source ↗

Anthropic and NNSA Co-Develop Nuclear Safeguards Classifier for Claude Traffic

Anthropic, in partnership with the U.S. Department of Energy's National Nuclear Security Administration (NNSA) and DOE national laboratories, has co-developed an AI classifier that distinguishes between concerning and benign nuclear-related conversations with 96% accuracy in preliminary testing. The classifier has already been deployed on live Claude traffic as part of Anthropic's misuse-detection infrastructure. Anthropic plans to share the approach with the Frontier Model Forum as a replicable blueprint for other AI developers. This represents the first public-private partnership of this kind for nuclear proliferation risk monitoring in frontier AI systems.

Evaluation and Benchmarking AI Safety Research Nuclear Proliferation Risk Classifier Claude Anthropic Policy Frontier Red Team +5 more

5arXiv · cs.AI·27d ago·source ↗

BIRDNet: Interpretable Neural Networks via Boolean Implication Knowledge Graphs for Tabular Data

BIRDNet is a neurosymbolic architecture that mines Boolean implication relationships (BIRs) from tabular data using a sparse-exception binomial test, then encodes the resulting directed graph as the connectivity structure of a layered neural network. Each hidden unit corresponds to exactly one mined rule and binds only to its two features, yielding up to 96× parameter reduction versus a matched dense MLP. Evaluated on six transcriptomic and proteomic benchmarks, BIRDNet stays within 0.02 AUROC of dense baselines while recovering known biological signatures such as canonical amplicons and immune-infiltration markers. Unlike most neurosymbolic approaches, BIRDNet derives its structural prior from data rather than an external rule base.

Evaluation and Benchmarking AI Safety Research MAHI-Group multilayer perceptron (MLP)sparse-exception binomial test +3 more

3arXiv · cs.LG·16d ago·source ↗

Bradley-Terry model proposed for fairer ranking of recommendation algorithms across dataset types

A new arXiv preprint introduces a Bradley-Terry (BT) model-based methodology for ranking recommendation algorithms in a way that accounts for dataset characteristics such as sparsity, sequential structure, and scale. The authors argue that naive metric aggregation (e.g., averaging NDCG) produces misleading rankings and propose BT trees and covariate-extended BT models as alternatives. The framework also enables ranking predictions on unseen datasets without running the models, and includes a new metric for ranking consistency.

Evaluation and Benchmarking Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies NDCG

5arXiv · cs.AI·8d ago·source ↗

Internal Oppenheim-Lim test reveals phase/sign identity codes shared across image classifier architectures

A new arXiv preprint applies a causal intervention inspired by Oppenheim and Lim (1981) to probe whether trained image classifiers encode identity in Fourier phase rather than magnitude within their hidden layers. By transplanting phase or sign components between images at chosen layers in PRISM2D, GFNet, ViT-B/16, and ResNet-50, the authors find that predictions follow the phase/sign donor across all tested architectures, with image-specific magnitude largely dispensable. ResNet-50 requires a pre-ReLU intervention to reveal a latent sign code, exposing how rectification and readout geometry shape the basis in which the code is expressed. The findings offer a mechanistic account of the texture–shape gap between CNNs and attention-based models.

Evaluation and Benchmarking ViT-B/16 GFNet PRISM2D +2 more

5arXiv · cs.CL·14d ago·source ↗

TRACE: Tree-structured rollout budget allocation for efficient agentic RL training

TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework for improving reinforcement learning with verifiable rewards (RLVR) in multi-turn agentic LLM settings. The method models each ReAct-style thought-action-observation turn as a distinct node, enabling budget allocation across both prompt-level and turn-level prefixes in a tree structure, rather than only at the prompt level. A shared predictor estimates conditional success probability at each anchor to guide allocation, enriching reward contrast within a fixed sampling budget. Empirically, TRACE improves Qwen3-14B multi-hop QA accuracy by 2.8 points over baselines at equal sampling cost.

Evaluation and Benchmarking Agent and Tool Ecosystem RLVR TRACE ReAct +2 more

5arXiv · cs.LG·13d ago·source ↗

ATLAS: Active learning framework for automated discovery of interpretable behavioral models in cognitive science

ATLAS (Active Theory Learning for Automated Science) is a new active learning framework that iterates between generating mechanistic hypotheses as sparse neural network ensembles and designing maximally informative experiments to distinguish between them. The system is tested on recovering reinforcement learning agents from behavioral data in bandit tasks, achieving 5-10x sample efficiency improvements over random experimentation and matching expert-designed experiments from the literature. The work targets automated scientific discovery in cognitive science, with potential generalization to other domains requiring mechanistic modeling.

Evaluation and Benchmarking ATLAS: Active Theory Learning for Automated Science Disentangled RNNs Atlas