4arXiv cs.AI (Artificial Intelligence)·10d ago

SECDA-DSE: LLM-guided design space exploration for FPGA accelerator generation

SECDA-DSE is a framework that integrates LLMs into the SECDA hardware-software co-design ecosystem to automate design space exploration (DSE) of FPGA-based AI accelerators. The system combines a structured architecture candidate generator with an LLM Stack using retrieval-augmented generation and chain-of-thought prompting, plus an iterative feedback loop. Evaluation demonstrates end-to-end synthesis and execution of three accelerator designs on real FPGA hardware, with results showing the approach captures kernel-specific compute/memory trade-offs while reducing manual design effort.

Training Infrastructure Agent and Tool Ecosystem chain-of-thought prompting SECDA-DSE Retrieval-Augmented Generation

Related guides (3)

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Retrieval-Augmented GenerationConcept

Retrieval-Augmented Generation (RAG): Giving AI a Library Card

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·11d ago·source ↗

AGENTSERVESIM: Hardware-aware simulator for multi-turn LLM agent serving policies

Researchers introduce AGENTSERVESIM, a simulation framework designed to evaluate serving policies for multi-turn LLM agents without requiring dedicated accelerator hardware. The simulator models program-level execution including turn dependencies, tool-induced gaps, and KV-cache residency across HBM, host DRAM, and CXL memory hierarchies. It reproduces real-system behavior within 6% error on key performance metrics while running on commodity CPUs, enabling cost-effective exploration of scheduling, routing, and cache management policies for agentic workloads.

Training Infrastructure Inference Economics AGENTSERVESIM +1 more

4arXiv · cs.CL·12d ago·source ↗

DEFINED: Data-efficient framework for fine-grained creativity assessment in debate using LLMs

DEFINED is a computational framework for automated creativity assessment in debate scenarios, operationalizing creativity through an eight-dimensional hierarchical metric system implemented via a pretrained autoregressive language model with a hierarchical scoring head. The system addresses data scarcity through constrained data augmentation and mixed-granularity training from limited expert-annotated data. It outperforms prompt-based LLM evaluators and existing debate scoring methods on authentic competition data. The work is relevant to AI evaluation methodology and the broader question of whether LLMs can reliably assess complex human cognitive outputs.

Evaluation and Benchmarking DEFINED

5Openai Blog·1mo ago·source ↗

A Hazard Analysis Framework for Code Synthesis Large Language Models

OpenAI published a hazard analysis framework specifically targeting code synthesis LLMs, addressing the safety and risk dimensions of models that generate executable code. The framework likely identifies threat categories, failure modes, and mitigation strategies relevant to deploying code-generating AI systems. This represents an early structured attempt to apply safety engineering methodology to a specific LLM capability domain. The work is relevant to both AI safety research and enterprise deployment considerations for coding assistants.

AI Safety Research Agent and Tool Ecosystem hazard analysis framework code synthesis LLMs OpenAI

6arXiv · cs.CL·24d ago·source ↗

SAERL: Using Sparse Autoencoders to Guide LLM Reinforcement Learning Data Engineering

SAERL is a post-training data engineering framework that uses Sparse Autoencoders (SAEs) — a mechanistic interpretability tool — to extract intrinsic model signals for controlling data diversity, difficulty, and quality during RL fine-tuning. The framework applies SAE-space clustering for batch diversity, a difficulty proxy for curriculum ordering, and a quality probe for data filtering. On Qwen2.5-Math-1.5B with GRPO, SAERL achieves 3% average accuracy improvement and reaches target accuracy with 20% fewer training steps. SAE representations transfer across model families and scales, suggesting broad applicability as a lightweight data engineering tool.

Training Infrastructure Evaluation and Benchmarking mechanistic interpretability GRPO Reinforcement Learning from Human Feedback +6 more

4arXiv · cs.LG·11d ago·source ↗

Constrained LLM interface for multi-physics finite element simulations in FEniCS

Researchers present a natural-language interface for FEniCS finite element simulations that deliberately constrains LLM involvement to front-end parsing and geometry generation, while a deterministic dispatcher routes validated specifications to human-written solver templates. The system achieves 100% final valid parse rate on a 15-prompt benchmark and 90% success on custom geometry generation. Validation against analytical solutions shows sub-percent agreement for smooth cases and 2-5% for harder nonlinear cases. The architecture is positioned as a reliability-focused alternative to open-ended LLM code generation for scientific computing.

Agent and Tool Ecosystem FEniCS A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS Gmsh

5arXiv · cs.LG·22d ago·source ↗

SchGen: LLM-Based PCB Schematic Generation via Semantic Code Representations

SchGen is presented as the first large language model system capable of generating editable PCB schematics from natural-language descriptions. The approach introduces a semantically grounded code representation that replaces verbose, geometry-heavy schematic formats with relative placement and pin-name-based wiring primitives, reframing the problem as a semantics-driven matching task. A large-scale dataset was constructed via a human-agent collaborative pipeline converting open-source hardware designs into the new representation. Experiments show SchGen outperforms alternative representations and larger general-purpose LLMs on wire connectivity accuracy and functional correctness.

Frontier Model Releases Agent and Tool Ecosystem semantic code representation human-agent collaborative pipeline wire connectivity accuracy +2 more

4Hugging Face Blog·1mo ago·source ↗

Faster Assisted Generation Support for Intel Gaudi

Hugging Face has published a blog post detailing assisted generation (speculative decoding) support optimized for Intel Gaudi accelerators. The post covers implementation details and performance improvements achieved by running assisted/speculative decoding on Gaudi hardware. This represents an infrastructure and inference optimization development relevant to non-NVIDIA AI accelerator deployment.

Training Infrastructure Inference Economics speculative decoding Assisted Generation Intel Gaudi +2 more

6arXiv · cs.CL·2d ago·source ↗

Decoupled Search Grounding (DSG): vendor-agnostic MCP-compatible architecture for LLM agent retrieval

Researchers introduce Decoupled Search Grounding (DSG), an architecture that moves real-time search grounding outside the reasoning model via an MCP-compatible gateway, exposing provider routing, caching, and retrieval-depth as explicit controls. Evaluated across five frontier models on SimpleQA, FreshQA, and HotpotQA, DSG nearly matches native search accuracy on SimpleQA (86.1% vs. 87.7%) while achieving 91% lower search cost and 68% lower latency via a 99.4% warm-cache hit rate. In a production e-commerce deployment, DSG cuts search cost by over 98% while matching or slightly exceeding native-search accuracy. The work frames real-time grounding as an optimizable interface boundary rather than a fixed model feature, with direct relevance to MCP-based agent infrastructure.

Inference Economics Enterprise Deployment Patterns FreshQA HotpotQA Decoupled Search Grounding +3 more