5arXiv cs.AI (Artificial Intelligence)·2d ago

PolicyGuard: Neuro-symbolic framework converts organizational policies into executable compliance review engines

PolicyGuard is a neuro-symbolic framework that converts organizational policy documents into typed relational logic rules and atom-level extraction questions, enabling auditable LLM-assisted document compliance review. During review, LLMs answer local questions grounded in retrieved document evidence, while a symbolic evaluator applies formal rules to detect non-compliance. The system is instantiated and evaluated on NDA compliance review against company-specific negotiation policies. The separation of policy formalization, document interpretation, and symbolic evaluation addresses the opacity and maintainability problems of end-to-end LLM prompting for compliance tasks.

Enterprise Deployment Patterns Agent and Tool Ecosystem PolicyGuard PolicyGuard: From Organizational Policies to Neuro-Symbolic Compliance Review Engines

Related guides (2)

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

ServiceNow AI has released AprielGuard, a guardrail system designed to improve safety and adversarial robustness in LLM deployments. The system targets prompt injection, jailbreaks, and other adversarial inputs that bypass standard safety measures. It is presented as a component for enterprise LLM pipelines seeking more robust content moderation and safety filtering.

AI Safety Research Enterprise Deployment Patterns ServiceNow AI AprielGuard Hugging Face +1 more

4arXiv · cs.CL·10d ago·source ↗

P4IR framework uses SFT + GRPO to improve LLM-based automated building code compliance

Researchers introduce P4IR, a two-stage framework combining supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO) to improve LLM accuracy in automated code compliance (ACC) for building regulations. The approach reduces tree edit distance and token-level Levenshtein distance by up to 23.8% and 38.6% respectively versus SFT baselines, and outperforms Claude Opus/Sonnet 4.5, GPT-5.2, Qwen-3-Max, and GLM-4.7 in zero-shot settings. The work targets a narrow but practically important domain where LLM hallucinations carry real regulatory consequences.

Enterprise Deployment Patterns Alignment and RLHF GPT-5.2 Claude Opus 4.6 Claude Sonnet 4.5 +4 more

6arXiv · cs.CL·15d ago·source ↗

GraphPO: Graph-based Policy Optimization reduces redundancy in LLM reasoning RL

GraphPO is a new reinforcement learning framework that represents reasoning rollouts as directed acyclic graphs rather than independent chains or trees, merging semantically equivalent reasoning paths into equivalence classes to share suffixes and reduce redundant exploration. The approach assigns efficiency advantages to incoming edges and correctness advantages to outgoing edges, deriving process supervision from outcome rewards. Experiments on three LLMs across reasoning and agentic search benchmarks show consistent improvements over chain- and tree-based baselines under equal token or response budgets. The method also provides theoretical guarantees on reduced advantage-estimation variance.

Frontier Model Releases Alignment and RLHF GraphPO GraphPO: Graph-based Policy Optimization for Reasoning Models

5arXiv · cs.AI·4d ago·source ↗

LLawCo framework teaches embodied multi-agent LLMs to derive and follow cooperation laws

Researchers from MERL propose LLawCo (Learning Laws of Cooperation), a framework that enables embodied LLM-based agents to autonomously align with partners and task objectives in decentralized, partially observable environments. Agents reflect on past failures to extract misaligned behavioral patterns and derive high-level behavioral laws (e.g., 'Talk when necessary', 'Wait for partner'), which are incorporated into reasoning via supervised fine-tuning. The authors also introduce PARTNR-Dialog, a new large-scale multi-agent communicative planning benchmark, and report average success rate improvements of 4.5% on PARTNR-Dialog and 6.8% on TDW-MAT over state-of-the-art open-source communicative agent frameworks across four backbone LLMs.

Evaluation and Benchmarking Agent and Tool Ecosystem LLawCo MERL PARTNR +2 more

5arXiv · cs.CL·10d ago·source ↗

VADAOrchestra: Neurosymbolic framework combining LLM orchestration with Datalog+/- symbolic reasoning for adaptive workflows

VADAOrchestra is a neurosymbolic framework that hybridizes LLM-based orchestration with a symbolic Datalog+/- inference engine to model complex, adaptive workflows. An LLM incrementally plans and encodes workflow steps as logic programs, while a dedicated symbolic engine handles all inference, decoupling orchestration from execution. The approach targets auditability, scalability over large datasets, and explainability — limitations of pure LLM-agent architectures — and is evaluated on real-world financial use cases. The work positions itself as a bridge between traditional Business Process Management rigidity and LLM agent flexibility.

Enterprise Deployment Patterns Agent and Tool Ecosystem Datalog VADAOrchestra

5arXiv · cs.LG·16h ago·source ↗

DemoPSD: Disagreement-modulated policy self-distillation to fix privileged information leakage in LLM reasoning training

DemoPSD is a new training framework for LLMs that addresses two failure modes in on-policy self-distillation (OPSD): overfitting to in-domain patterns and privileged information leakage, where the student model learns answer-dependent shortcuts unavailable at test time. The method steers the student toward a reverse-KL barycenter target — a weighted geometric blend of teacher and student distributions — with token-level blending weights derived from the disagreement between the two distributions. Experiments on SciKnowEval across four scientific domains show DemoPSD outperforms GRPO and SDPO while maintaining higher training entropy and generalizing to out-of-distribution GPQA benchmarks.

Evaluation and Benchmarking Alignment and RLHF SciKnowEval GRPO SDPO +2 more

5arXiv · cs.CL·16h ago·source ↗

Online Safety Monitoring for LLMs via Threshold-Based Risk Control

A new arXiv preprint proposes a real-time safety monitor for LLMs that converts an external verifier signal into an alarm by thresholding, with the threshold calibrated via risk control. The authors evaluate the approach on mathematical reasoning and red-teaming datasets, finding it competitive with more complex sequential hypothesis testing monitors. The work addresses the practical deployment problem of detecting unsafe outputs after alignment training.

Evaluation and Benchmarking AI Safety Research Online Safety Monitoring for LLMs

5Hugging Face Blog·1mo ago·source ↗

An Introduction to AI Secure LLM Safety Leaderboard

Hugging Face introduces the DecodingTrust-based LLM Safety Leaderboard, a benchmark framework for evaluating large language models across multiple safety and trustworthiness dimensions. The leaderboard aims to provide standardized, reproducible safety assessments covering areas such as toxicity, stereotype bias, adversarial robustness, and privacy. It offers a public ranking of models to help researchers and practitioners compare safety properties across different LLMs.

Evaluation and Benchmarking AI Safety Research LLM Safety Leaderboard Hugging Face DecodingTrust