Almanac
← Events
3arXiv cs.LG (Machine Learning)·12d ago

LLM-augmented XAI framework with mutual feature interactions for network operations

A new arXiv paper proposes a framework combining LLMs with SHAP-based explainability, augmented by mutual feature interaction data, to generate natural language explanations for AI/ML models used in network operations. The approach is validated on an optical quality-of-transmission estimation task with human evaluators, showing 12.2% and 6.2% improvements in explanation usefulness and scope over a SHAP-only baseline, with 97.5% correctness. The work targets the gap between technical XAI outputs and actionable insights for non-specialist network operators.

Related guides (1)

Related events (8)

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

SPEX and ProxySPEX: Scalable Interaction Discovery for LLM Interpretability

Researchers from BAIR introduce SPEX (Spectral Explainer) and ProxySPEX, algorithms for identifying influential feature, data, and model-component interactions in LLMs at scale. The approach exploits sparsity, low-degreeness, and hierarchy properties to reframe interaction discovery as a sparse recovery problem using tools from signal processing and coding theory. ProxySPEX achieves comparable performance to SPEX with roughly 10x fewer ablations by leveraging hierarchical structure. The methods are evaluated on feature attribution (sentiment analysis), data attribution, and mechanistic interpretability tasks, outperforming marginal methods like LIME at long context lengths.

4arXiv · cs.AI·6d ago·source ↗

LEAF-X: Entropy-guided explainability framework for transformer-based ASR models

Researchers introduce LEAF-X (Listening with Entropy-guided Attention for Faithful explainability), a model-intrinsic XAI framework for transformer-based automatic speech recognition systems like Whisper. The method combines entropy-guided attention weighting, multi-layer attention rollout, and optional causal ablations to produce sparse token-to-frame attributions. Evaluations show 32% improved faithfulness and 35-39% stronger locality/sparsity compared to perturbation-based explainers and raw attention maps, enabling more auditable ASR.

5arXiv · cs.AI·27d ago·source ↗

Human Decision-Making with Persuasive and Narrative LLM Explanations

A large-scale behavioral experiment evaluated how LLM-generated narrative explanations of varying persuasiveness affect human decision-making accuracy in classification tasks. Results showed that persuasiveness level did not meaningfully improve decision accuracy over a simple AI prediction alone, consistent with prior explainable AI research using feature importance methods. Narratives increased AI reliance regardless of whether the AI prediction was correct or incorrect, and more persuasive narratives may have slowed response times and reduced ability to discriminate correct from incorrect AI predictions. The study concludes that narrative explanations involve tradeoffs and warrant further investigation into when and how they should be deployed.

4arXiv · cs.CL·2d ago·source ↗

Survey proposes four-layer architecture for token-operations-oriented LLM inference optimization

A new arXiv preprint introduces a four-layer technical architecture—Multi-model Fusion, Model Optimization, Compute-Model Fusion, and Compute-Network-Model Fusion—for systematically organizing LLM inference optimization techniques. The paper reviews key technologies and industry status at each layer and analyzes their application in real-world business scenarios. The framing around 'token operations' positions inference optimization as an operational discipline analogous to traditional IT operations.

5Hugging Face Blog·1mo ago·source ↗

Open-source LLMs as LangChain Agents

This Hugging Face blog post explores using open-source LLMs as agents within the LangChain framework. It examines the capability of various open-weight models to perform tool use, reasoning, and multi-step task execution in agentic settings. The post likely benchmarks or compares several models on agent-relevant tasks, providing practical guidance for deploying open-source alternatives to proprietary models in agent pipelines.

4arXiv · cs.CL·10d ago·source ↗

Zero-shot LLMs fail to beat baselines on stock prediction; explainability signals retain practical value

A new arXiv preprint evaluates zero-shot NLP pipelines for predicting short-term stock movements from financial news, finding that across multiple models and prediction horizons, zero-shot approaches consistently fail to outperform simple baselines, with especially weak performance on negative price movements. The authors introduce a multi-layered explainability framework linking predictions to token-, article-, and aggregate-level evidence, finding that explainability signals can reliably distinguish trustworthy from unreliable predictions even when accuracy is low. The work argues for a shift toward decision-support systems emphasizing transparency and uncertainty awareness rather than raw predictive accuracy.

5arXiv · cs.CL·3d ago·source ↗

ClaMPAPP: Hybrid LLM-ML system uses language models as interfaces for pediatric appendicitis diagnosis

Researchers introduce ClaMPAPP, a hybrid clinical decision support system that uses an LLM solely for structured feature extraction from free-text clinical notes, then passes validated features to an XGBoost classifier for final diagnosis. Evaluated on two independent German pediatric appendicitis cohorts, ClaMPAPP outperformed end-to-end LLM baselines on diagnostic performance and showed greater robustness to narrative reordering. The work formalizes an 'LLM-as-interface, ML-as-predictor' design pattern that separates natural-language usability from predictive inference, offering a more auditable pathway for clinical AI.

4arXiv · cs.CL·2d ago·source ↗

Mechanistic analysis of how LLMs encode essay quality in internal representations

Researchers systematically probe the hidden representations of eight LLMs across three essay datasets (ASAP++, CSEE, ENEM) to understand how automated essay scoring (AES) works internally. Using linear probing, dimensionality reduction, and neuron-level analysis, they find essay quality is encoded in a linearly accessible form that emerges progressively across layers and partially transfers across prompts. Individual 'essay scoring neurons' are identified whose activations correlate with scores and respond to targeted interventions, with longer essays relying more on deeper layers. The work contributes to mechanistic interpretability of LLM-based scoring systems.