PolicyGuard: Neuro-symbolic framework converts organizational policies into executable compliance review engines
PolicyGuard is a neuro-symbolic framework that converts organizational policy documents into typed relational logic rules and atom-level extraction questions, enabling auditable LLM-assisted document compliance review. During review, LLMs answer local questions grounded in retrieved document evidence, while a symbolic evaluator applies formal rules to detect non-compliance. The system is instantiated and evaluated on NDA compliance review against company-specific negotiation policies. The separation of policy formalization, document interpretation, and symbolic evaluation addresses the opacity and maintainability problems of end-to-end LLM prompting for compliance tasks.
Related guides (2)
Related events (8)
AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems
ServiceNow AI has released AprielGuard, a guardrail system designed to improve safety and adversarial robustness in LLM deployments. The system targets prompt injection, jailbreaks, and other adversarial inputs that bypass standard safety measures. It is presented as a component for enterprise LLM pipelines seeking more robust content moderation and safety filtering.
P4IR framework uses SFT + GRPO to improve LLM-based automated building code compliance
Researchers introduce P4IR, a two-stage framework combining supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO) to improve LLM accuracy in automated code compliance (ACC) for building regulations. The approach reduces tree edit distance and token-level Levenshtein distance by up to 23.8% and 38.6% respectively versus SFT baselines, and outperforms Claude Opus/Sonnet 4.5, GPT-5.2, Qwen-3-Max, and GLM-4.7 in zero-shot settings. The work targets a narrow but practically important domain where LLM hallucinations carry real regulatory consequences.
GraphPO: Graph-based Policy Optimization reduces redundancy in LLM reasoning RL
GraphPO is a new reinforcement learning framework that represents reasoning rollouts as directed acyclic graphs rather than independent chains or trees, merging semantically equivalent reasoning paths into equivalence classes to share suffixes and reduce redundant exploration. The approach assigns efficiency advantages to incoming edges and correctness advantages to outgoing edges, deriving process supervision from outcome rewards. Experiments on three LLMs across reasoning and agentic search benchmarks show consistent improvements over chain- and tree-based baselines under equal token or response budgets. The method also provides theoretical guarantees on reduced advantage-estimation variance.
LLawCo framework teaches embodied multi-agent LLMs to derive and follow cooperation laws
Researchers from MERL propose LLawCo (Learning Laws of Cooperation), a framework that enables embodied LLM-based agents to autonomously align with partners and task objectives in decentralized, partially observable environments. Agents reflect on past failures to extract misaligned behavioral patterns and derive high-level behavioral laws (e.g., 'Talk when necessary', 'Wait for partner'), which are incorporated into reasoning via supervised fine-tuning. The authors also introduce PARTNR-Dialog, a new large-scale multi-agent communicative planning benchmark, and report average success rate improvements of 4.5% on PARTNR-Dialog and 6.8% on TDW-MAT over state-of-the-art open-source communicative agent frameworks across four backbone LLMs.
VADAOrchestra: Neurosymbolic framework combining LLM orchestration with Datalog+/- symbolic reasoning for adaptive workflows
VADAOrchestra is a neurosymbolic framework that hybridizes LLM-based orchestration with a symbolic Datalog+/- inference engine to model complex, adaptive workflows. An LLM incrementally plans and encodes workflow steps as logic programs, while a dedicated symbolic engine handles all inference, decoupling orchestration from execution. The approach targets auditability, scalability over large datasets, and explainability — limitations of pure LLM-agent architectures — and is evaluated on real-world financial use cases. The work positions itself as a bridge between traditional Business Process Management rigidity and LLM agent flexibility.
DemoPSD: Disagreement-modulated policy self-distillation to fix privileged information leakage in LLM reasoning training
DemoPSD is a new training framework for LLMs that addresses two failure modes in on-policy self-distillation (OPSD): overfitting to in-domain patterns and privileged information leakage, where the student model learns answer-dependent shortcuts unavailable at test time. The method steers the student toward a reverse-KL barycenter target — a weighted geometric blend of teacher and student distributions — with token-level blending weights derived from the disagreement between the two distributions. Experiments on SciKnowEval across four scientific domains show DemoPSD outperforms GRPO and SDPO while maintaining higher training entropy and generalizing to out-of-distribution GPQA benchmarks.
Online Safety Monitoring for LLMs via Threshold-Based Risk Control
A new arXiv preprint proposes a real-time safety monitor for LLMs that converts an external verifier signal into an alarm by thresholding, with the threshold calibrated via risk control. The authors evaluate the approach on mathematical reasoning and red-teaming datasets, finding it competitive with more complex sequential hypothesis testing monitors. The work addresses the practical deployment problem of detecting unsafe outputs after alignment training.
An Introduction to AI Secure LLM Safety Leaderboard
Hugging Face introduces the DecodingTrust-based LLM Safety Leaderboard, a benchmark framework for evaluating large language models across multiple safety and trustworthiness dimensions. The leaderboard aims to provide standardized, reproducible safety assessments covering areas such as toxicity, stereotype bias, adversarial robustness, and privacy. It offers a public ranking of models to help researchers and practitioners compare safety properties across different LLMs.

