Almanac
← Events
4arXiv cs.AI (Artificial Intelligence)·16h ago

TopoTTA integrates persistent homology into test-time adaptation for anomaly segmentation

Researchers introduce TopoTTA, a framework that incorporates persistent homology from topological data analysis into test-time adaptation (TTA) for anomaly segmentation. The method applies multi-level cubical complex filtration to anomaly score maps to generate topological pseudo-labels that guide a lightweight test-time classifier, avoiding pixel-level heuristics like confidence thresholding. Evaluated across six benchmarks including MVTec AD, VisA, and MVTec 3D-AD, TopoTTA achieves an average 15% F1 improvement over state-of-the-art unsupervised methods, with the largest gains on geometrically complex defects.

Related guides (1)

Related events (8)

6arXiv · cs.AI·18d ago·source ↗

Atlas H&E-TME: AI system matches expert pathologist accuracy for scalable tumor microenvironment profiling

Researchers present Atlas H&E-TME, an AI system built on the Atlas family of pathology foundation models that generates over 4,500 quantitative readouts per whole-slide H&E image at cell-level resolution across multiple cancer types. The system is validated using a novel dual framework: an IHC-informed multi-pathologist consensus protocol for depth, and benchmarking against 200,000+ annotations across 1,500+ cases from 25+ sources spanning eight cancer types. Atlas H&E-TME matches or exceeds pathologist H&E-only performance, demonstrating that standard histopathology slides can serve as a scalable quantitative window into the tumor microenvironment. The work advances computational pathology by enabling tissue-based biomarker discovery without requiring specialized staining modalities.

5arXiv · cs.LG·27d ago·source ↗

ProtoAda: Prototype-Guided Adaptive Adapter Expansion for Multimodal Continual Instruction Tuning

ProtoAda is a new framework for Multimodal Continual Instruction Tuning (MCIT) that addresses a key failure mode in sparse Mixture-of-LoRA-Experts architectures: image-text similarity routing is format-blind and incorrectly merges tasks with similar semantics but different output structures (e.g., coordinate prediction vs. VQA). The method introduces format-aware task prototypes to guide both routing and adapter expansion, then consolidates compatible updates geometrically to reuse and refine existing parameters. Experiments across multiple benchmarks show improved performance, particularly on tasks whose answer formats are vulnerable to corruption by sequential fine-tuning.

5arXiv · cs.LG·20d ago·source ↗

Topo-Omni: Topographic multimodal model discovers functionally selective brain regions consistent with human neuroimaging

Researchers introduce Topo-Omni, a topographic multimodal model that jointly represents visual, auditory, and language/cognitive processing on a single contiguous in-silico cortical sheet, built by fine-tuning a pretrained foundation model with a spatial smoothness objective. The model develops clusters consistent with human neuroimaging data, and driving or suppressing clusters selectively biases or impairs perception in ways that parallel human intervention studies. The authors use the model to screen for novel cortical networks in-silico and validate discoveries — including natural landscape and animal networks — in human neuroimaging data. The work bridges deep learning architectures and computational neuroscience, offering testable hypotheses about cortical organization.

4arXiv · cs.LG·20d ago·source ↗

Topological Neural Operators: operator learning on cell complexes via Discrete Exterior Calculus

Researchers introduce Topological Neural Operators (TNOs), a framework that extends neural operators from point/edge functions to general topological domains (cell complexes) using Discrete Exterior Calculus. The design decouples fixed topological information flow from learned transformations, enabling models that respect geometric structure and conservation laws. A hierarchical variant (HTNOs) adds learned coarse complexes for long-range propagation. TNOs subsume existing neural operators as a special case and show accuracy improvements on PDE benchmarks including irregular-geometry flow problems.

6arXiv · cs.AI·1mo ago·source ↗

PGT: Procedurally Generated Tasks for Improving Visual Grounding in MLLMs

This paper introduces Procedurally Generated Tasks (PGT), a data-driven framework that overlays geometric primitives on images to create dense supervision signals for fine-grained visual grounding in multimodal large language models. PGT serves both as a training augmentation method and a diagnostic tool to isolate perception failures from semantic priors. Instruction tuning on LLaVA-v1.5-Instruct augmented with PGT data yields gains of up to +20% on the What'sUp benchmark and +13.3% on CV-Bench-2D. The results suggest that spatial reasoning deficits in MLLMs stem primarily from inadequate supervision rather than architectural or resolution constraints.

5arXiv · cs.AI·1mo ago·source ↗

VisAnomReasoner: Efficient VLM for Time-Series Anomaly Detection via VisAnomBench

Researchers introduce VisAnomBench, a curated benchmark augmenting public time-series anomaly datasets with natural-language rationales generated and selected from multiple large VLMs using task-specific rewards. Fine-tuning on this benchmark produces VisAnomReasoner, a parameter-efficient vision-language model that outperforms all baselines by at least 21.23 and 23.87 percentage points in precision and F1 on VisAnomBench. Cross-benchmark evaluation on TSB-AD-U shows further generalization gains of 9.57 and 13.39 percentage points in precision and F1, respectively.

5arXiv · cs.LG·1mo ago·source ↗

TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning

TrajTok is a trajectory encoder that learns transferable GPS trace representations via multi-resolution hexagonal spatial tokenization and masked-token pretraining. It uses a factorized transformer with per-modality self-attention, cross-attention fusion, and spatiotemporal rotary position embeddings (ST-RoPE) to jointly encode geometry and kinematics. A single frozen TrajTok encoder with lightweight adapters outperforms task-specific methods on trajectory similarity search, classification, ETA, and travel-time regression on the Porto dataset. The work positions learned spatial tokenization plus masked pretraining as a viable path toward general-purpose trajectory foundation models.

5arXiv · cs.AI·18d ago·source ↗

TAHOE: Error-driven hint learning system substantially improves Text-to-SQL on Spider 2.0

TAHOE is a Text-to-SQL system that treats prompt optimization as a dynamic data management problem, building a structured Hint Bank from compiler, execution, and user feedback without updating model parameters. On the Spider 2.0-Snow benchmark using GPT-5.5, it raises pass rate from 61.95% to 79.42% and achieves 100% Snowflake syntax compliance while reducing compiler-feedback rounds from 2.79 to 0.12. The learned Hint Bank transfers to weaker models, yielding a 19.7 percentage-point gain on Doubao-2.0-lite. The approach targets the production deployment gap between Text-to-SQL prototypes and real-world database environments with strict dialects and large schemas.