VADAOrchestra: Neurosymbolic framework combining LLM orchestration with Datalog+/- symbolic reasoning for adaptive workflows
VADAOrchestra is a neurosymbolic framework that hybridizes LLM-based orchestration with a symbolic Datalog+/- inference engine to model complex, adaptive workflows. An LLM incrementally plans and encodes workflow steps as logic programs, while a dedicated symbolic engine handles all inference, decoupling orchestration from execution. The approach targets auditability, scalability over large datasets, and explainability — limitations of pure LLM-agent architectures — and is evaluated on real-world financial use cases. The work positions itself as a bridge between traditional Business Process Management rigidity and LLM agent flexibility.
Related guides (2)
Related events (8)
LabVLA: Vision-Language-Action model and RoboGenesis data engine for scientific laboratory robotics
Researchers introduce LabVLA, a Vision-Language-Action model designed to bridge written scientific protocols and physical robot execution in laboratory settings. To address the data scarcity problem, they build RoboGenesis, a simulation-based data engine that composes lab workflows from atomic skills and generates structured demonstrations across robot embodiments. LabVLA uses a two-stage training recipe combining FAST action token pretraining on a Qwen3-VL-4B-Instruct backbone with flow matching posttraining via a DiT action expert. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among evaluated baselines in both in-distribution and out-of-distribution settings.
WorkflowView: LLM-based framework abstracts user action logs into interpretable workflows
Researchers introduce WorkflowView, a framework using LLMs to convert low-level interaction logs into high-level activity descriptions across diverse domains. The system achieves strong results on three tasks: zero-shot browser log reconstruction (semantic similarity 0.91), few-shot MOOC dropout prediction (F1=0.90 with five examples), and privacy-preserving analysis of AI tool usage in Microsoft Word. The work addresses limitations of prior deep learning clustering approaches, which struggled with noise and cross-application generalization, and discusses deployment considerations including computational efficiency and privacy.
OrchRM: Self-supervised reward modeling for multi-agent orchestration without human annotations
Researchers propose Orchestration Reward Modeling (OrchRM), a self-supervised framework that trains reward models for LLM-based multi-agent orchestrators using intermediate execution artifacts to construct win-lose pairs for Bradley-Terry training. The approach avoids costly sub-agent rollouts by operating directly at the orchestration level, achieving up to 10x improvement in training token efficiency and up to 8% accuracy gains in test-time scaling. Results generalize across mathematical reasoning, web-based QA, and multi-hop reasoning tasks.
Neurosymbolic Learning for Inference-Time Argumentation in Claim Verification
This paper introduces Inference-Time Argumentation (ITA), a trainable neurosymbolic framework for ternary claim verification (true/false/uncertain) that integrates formal argumentation semantics with LLM training. The framework uses argumentation semantics both to guide LLM training for argument generation and scoring, and to compute final predictions deterministically from explicit argumentative structures. Unlike conventional reasoning models that rely on potentially unfaithful post-hoc explanations, ITA produces verdicts that are faithful by construction to the underlying arguments. Experiments on two ternary claim verification datasets show ITA outperforms argumentative baselines and competes with non-argumentative direct-prediction approaches.
AIR: Adaptive Interleaved Reasoning with Code in Multimodal LLMs via Reinforcement Learning
Researchers propose AIR, a system that trains multimodal large language models to adaptively interleave reasoning with code execution for numerical computation tasks, going beyond prior work that focused only on visual operations. The approach combines a two-stage cold-start data pipeline, RL dataset filtering, and a group-constrained reward function for tool-invocation decisions. Experiments show a 6.1 percentage point average improvement on evaluation benchmarks, with interleaved reasoning samples gaining 9.9 pp and tool-use success exceeding 95%.
DEFINED: Data-efficient framework for fine-grained creativity assessment in debate using LLMs
DEFINED is a computational framework for automated creativity assessment in debate scenarios, operationalizing creativity through an eight-dimensional hierarchical metric system implemented via a pretrained autoregressive language model with a hierarchical scoring head. The system addresses data scarcity through constrained data augmentation and mixed-granularity training from limited expert-annotated data. It outperforms prompt-based LLM evaluators and existing debate scoring methods on authentic competition data. The work is relevant to AI evaluation methodology and the broader question of whether LLMs can reliably assess complex human cognitive outputs.
CHORUS: Single VLA policy enables decentralized multi-robot collaboration without inter-robot communication
CHORUS is a framework that adapts a single vision-language-action (VLA) backbone to control diverse multi-robot teams in a fully decentralized manner, with each robot running an independent copy conditioned only on its own observations and a robot-identifying prompt. Real-world experiments across tasks like tape measurement, book handovers, and laundry basket lifting show a 64-percentage-point improvement over decentralized from-scratch models and 40-point improvement in reactivity to teammate behavior, while outperforming centralized baselines. The key insight is that pretrained VLA visuomotor priors are sufficient to enable reactive coordination without explicit inter-robot communication or alignment procedures at inference time.
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
This paper introduces the stochastic-deterministic boundary (SDB) as a foundational architectural primitive for production LLM agent runtimes, defining it as a four-part contract (proposer, verifier, commit step, reject signal) governing how LLM outputs become system actions. The authors organize agent runtime design around Coordination, State, and Control concerns, presenting a catalog of six runtime patterns applicable to conversational, autonomous, and long-horizon agents. A five-step pattern-selection methodology and diagnostic procedure mapping production failures to pattern weaknesses are contributed, along with a newly named failure mode—replay divergence—where LLM consumers of deterministic event logs produce inconsistent outputs across model versions or prompt changes. The paper argues that as model variance decreases, architectural pattern choice and SDB strength become the dominant reliability levers.

