5arXiv cs.AI (Artificial Intelligence)·23d ago

BIRDNet: Interpretable Neural Networks via Boolean Implication Knowledge Graphs for Tabular Data

BIRDNet is a neurosymbolic architecture that mines Boolean implication relationships (BIRs) from tabular data using a sparse-exception binomial test, then encodes the resulting directed graph as the connectivity structure of a layered neural network. Each hidden unit corresponds to exactly one mined rule and binds only to its two features, yielding up to 96× parameter reduction versus a matched dense MLP. Evaluated on six transcriptomic and proteomic benchmarks, BIRDNet stays within 0.02 AUROC of dense baselines while recovering known biological signatures such as canonical amplicons and immune-infiltration markers. Unlike most neurosymbolic approaches, BIRDNet derives its structural prior from data rather than an external rule base.

Evaluation and Benchmarking AI Safety Research MAHI-Group multilayer perceptron (MLP)sparse-exception binomial test BIRDNet Boolean Implication Relationships (BIRs)AUROC

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·11d ago·source ↗

BODHI: Contrastive embedding training for causal discovery in Large Behavioural Models

Researchers identify a critical failure mode in biomedical language model embeddings: off-the-shelf encoders (BioBERT, PubMedBERT, BioM-ELECTRA) assign high cosine similarity (0.76–0.92) to causally unrelated cross-domain pairs, achieving 0% accuracy on cross-domain discrimination. The paper introduces BODHI, a contrastive training approach using hard negatives mined from a biomedical knowledge graph, which improves within-vs-across-domain separation from 1.05x to 2.30x and raises discrimination gap by +0.392. The work targets Large Behavioural Models (LBMs)—foundation models that reason over personal life graphs—where false embedding proximity directly produces false causal edges. Additional contributions include an OpenVINO inference optimization achieving 133x latency reduction (1367ms to 10ms) on Intel AMX hardware, plus a counterintuitive finding that FP16 outperforms INT8 on this silicon.

Evaluation and Benchmarking Inference Economics BIOSSES BioBERT PubMedBERT +4 more

6Openai Blog·1mo ago·source ↗

Understanding Neural Networks Through Sparse Circuits

OpenAI has published work on mechanistic interpretability using a sparse model approach aimed at understanding how neural networks reason internally. The research seeks to make AI systems more transparent by identifying sparse circuits within neural networks. This work is positioned as supporting safer and more reliable AI behavior through improved interpretability.

Evaluation and Benchmarking AI Safety Research Sparse Circuits mechanistic interpretability OpenAI

5arXiv · cs.AI·47h ago·source ↗

DeepSWIP: Counterfactual reasoning for neural probabilistic logic programs via quotient-WMC

DeepSWIP introduces a single-world counterfactual semantics for DeepProbLog, enabling causal inference over neurosymbolic programs that combine neural perception with probabilistic logic. The approach uses neural materialization to reduce neural predicates to standard ProbLog choices, then applies Single World Intervention Programs (SWIPs) and weighted model counting to compute exact counterfactuals from a single transformed program. Experiments on MPI3D validate the method against a DeepTwin construction across 12,000 queries and show a 2.14× inference speedup, while a SUMO HOV experiment demonstrates that neural calibration degradation biases plug-in causal estimates and that a correctly scoped AIPW estimator removes most first-order bias.

Evaluation and Benchmarking AI Safety Research DeepSWIP MPI3D DeepProbLog +1 more

5arXiv · cs.AI·1mo ago·source ↗

Neurosymbolic Learning for Inference-Time Argumentation in Claim Verification

This paper introduces Inference-Time Argumentation (ITA), a trainable neurosymbolic framework for ternary claim verification (true/false/uncertain) that integrates formal argumentation semantics with LLM training. The framework uses argumentation semantics both to guide LLM training for argument generation and scoring, and to compute final predictions deterministically from explicit argumentative structures. Unlike conventional reasoning models that rely on potentially unfaithful post-hoc explanations, ITA produces verdicts that are faithful by construction to the underlying arguments. Experiments on two ternary claim verification datasets show ITA outperforms argumentative baselines and competes with non-argumentative direct-prediction approaches.

Evaluation and Benchmarking AI Safety Research large language models Inference-Time Argumentation (ITA)ternary claim verification +3 more

5Openai Blog·1mo ago·source ↗

Multimodal neurons in artificial neural networks

OpenAI researchers discovered neurons in CLIP that respond to the same concept across literal, symbolic, and conceptual representations. This finding parallels multimodal neurons previously observed in biological brains and helps explain CLIP's ability to classify unusual visual renditions of concepts. The work is presented as a step toward understanding the associations and biases learned by CLIP and similar vision-language models.

AI Safety Research Multimodal Progress OpenAI multimodal neurons CLIP

4arXiv · cs.CL·4d ago·source ↗

RDS Fusion: Hybrid neuro-symbolic gating with compressed CoT for zero-shot irony detection

Researchers introduce the Robust Dual-Signal (RDS) Fusion framework, a hybrid neuro-symbolic architecture that compresses Chain-of-Thought reasoning without supervised fine-tuning for irony and sarcasm detection in social media text. Evaluated on TweetEval (N=734) and iSarcasm, the zero-shot system matches fine-tuned BERTweet performance and outperforms supervised SemEval transformer ensembles on the imbalanced iSarcasm dataset. A statistical ablation shows that only the full concurrent fusion of all three signals yields a validated improvement, with individual components providing no significant standalone gain.

Evaluation and Benchmarking TweetEval BERTweet Robust Dual-Signal Fusion +1 more

5arXiv · cs.LG·17d ago·source ↗

Information-theoretic formalization of the binding problem in Vision Transformers

Researchers introduce a formal information-theoretic framework for the binding problem — the challenge of associating features (color, shape) with the correct objects in multi-object scenes. They develop a probing method to measure binding information in model representations and apply it to several pre-trained Vision Transformers, examining components like the [CLS] token and spatial tokens across datasets with feature sharing, occlusion, and natural features. Results position binding information as a key factor in visual recognition and reasoning quality, and suggest current ViT architectures have limited binding capability, consistent with known failure modes.

Evaluation and Benchmarking Multimodal Progress ViT (Vision Transformer)Formalizing the Binding Problem

6arXiv · cs.LG·23d ago·source ↗

Label-Free Bias Identification in Vision Models via Gradient Probes on Concept Decompositions

This paper introduces a post-hoc, label-free method for identifying spurious correlations in frozen vision classifiers without requiring bias annotations, group labels, or retraining. The approach applies non-negative matrix factorization to intermediate activations to extract interpretable concept vectors, then ranks them using a gradient-based bias estimator derived from misclassified examples. On Colored MNIST, Waterbirds, and CelebA benchmarks, the method recovers known spurious cues and improves worst-group accuracy by up to 17.9 percentage points on Waterbirds by suppressing top-ranked concepts at inference time. Notably, the method surfaces decision-relevant directions that do not always coincide with annotated attributes, offering both an auditing tool and a debiasing handle for deployed models.

Evaluation and Benchmarking AI Safety Research Colored MNIST Waterbirds CelebA +2 more