Entity · technique

chain-of-thought prompting

techniqueactivechain-of-thought-prompting-4c0e0cfe·4 events·first seen May 27, 2026

Aliases: chain-of-thought prompting

Co-occurring entities

Medical Subject Headings MeSH-Rel-4K SECDA-DSE Retrieval-Augmented Generation BenHalluScore Bengali BenHalluEval GPT-5.5 ENPMR-Bench Maslow's Hierarchy of Needs Emotional Need-aware Proactive Memory Retrieval

More like this (12)

chain-of-thought monitoring latent chain-of-thought clarifying-question prompting Chain-of-Thought Reasoning few-shot prompting chain-of-thought training data generation student-teacher prompting checklist prompting Program-of-Thought Agentic Chain-of-Thought Steering knowledge graph prompting embodiment-aware prompt conditioning

Recent events (4)

4arXiv · cs.CL·Jul 21, 2026·source ↗

Benchmarking small open-source LLMs for biomedical ontology generation with MeSH-Rel-4K dataset

Researchers evaluate five small open-source LLMs (up to 9B parameters) on identifying semantic relationships between biomedical concepts, introducing MeSH-Rel-4K, a 4,000-relationship dataset derived from Medical Subject Headings. Three adaptation strategies are compared: standard prompting, Chain-of-Thought prompting, and fine-tuning. Fine-tuning yields a 34.1 percentage point average F1-score improvement, demonstrating that targeted fine-tuning can overcome reasoning limitations of parameter-constrained models for specialized domain tasks.

Evaluation and Benchmarking Open Weights Progress chain-of-thought prompting Medical Subject Headings MeSH-Rel-4K

4arXiv · cs.AI·Jun 10, 2026·source ↗

SECDA-DSE: LLM-guided design space exploration for FPGA accelerator generation

SECDA-DSE is a framework that integrates LLMs into the SECDA hardware-software co-design ecosystem to automate design space exploration (DSE) of FPGA-based AI accelerators. The system combines a structured architecture candidate generator with an LLM Stack using retrieval-augmented generation and chain-of-thought prompting, plus an iterative feedback loop. Evaluation demonstrates end-to-end synthesis and execution of three accelerator designs on real FPGA hardware, with results showing the approach captures kernel-specific compute/memory trade-offs while reducing manual design effort.

Training Infrastructure Agent and Tool Ecosystem chain-of-thought prompting SECDA-DSE Retrieval-Augmented Generation

4arXiv · cs.CL·Jun 1, 2026·source ↗

BenHalluEval: Multi-Task Hallucination Evaluation Framework for Bengali LLMs

BenHalluEval introduces the first systematic hallucination benchmark for Bengali, covering four tasks (generative QA, code-mixed QA, summarization, reasoning) with 12,000 hallucinated candidates generated via GPT-5.4 across twelve hallucination types. Seven LLMs are evaluated under a dual-track protocol separating false-positive rate on ground-truth instances from hallucination detection rate on hallucinated candidates. The proposed BenHalluScore metric reveals substantial variation (7.72%–55.42%) across models and tasks, and chain-of-thought prompting is found to shift response distributions without consistently improving hallucination discrimination. The work highlights gaps in low-resource language hallucination evaluation and critiques single-track and prompting-only evaluation approaches.

Evaluation and Benchmarking BenHalluScore chain-of-thought prompting Bengali +2 more

4arXiv · cs.CL·May 27, 2026·source ↗

ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents

This paper introduces ENPMR-Bench, a benchmark for evaluating Emotional Need-aware Proactive Memory Retrieval in memory-augmented language agents deployed for emotional support applications. The benchmark includes over 1,800 memory-augmented dialogues grounded in Maslow's hierarchy of needs, with structured mappings between emotional needs and supportive memory types. Experiments show that both embedding-based and LLM-driven retrieval paradigms fall significantly short of golden memory conditions on empathy scores, and while chain-of-thought prompting helps, a substantial performance gap remains. The work highlights a systematic gap in current agent memory systems when applied to affective rather than purely factual retrieval tasks.

Evaluation and Benchmarking Agent and Tool Ecosystem ENPMR-Bench chain-of-thought prompting Maslow's Hierarchy of Needs +1 more