Almanac
technique

chain-of-thought prompting

techniqueactiveprovisionalchain-of-thought-prompting-4c0e0cfe·3 events·first seen 21d ago

Aliases: chain-of-thought prompting

Co-occurring entities

More like this (12)

Recent events (3)

4arXiv · cs.CL·21d ago·source ↗

ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents

This paper introduces ENPMR-Bench, a benchmark for evaluating Emotional Need-aware Proactive Memory Retrieval in memory-augmented language agents deployed for emotional support applications. The benchmark includes over 1,800 memory-augmented dialogues grounded in Maslow's hierarchy of needs, with structured mappings between emotional needs and supportive memory types. Experiments show that both embedding-based and LLM-driven retrieval paradigms fall significantly short of golden memory conditions on empathy scores, and while chain-of-thought prompting helps, a substantial performance gap remains. The work highlights a systematic gap in current agent memory systems when applied to affective rather than purely factual retrieval tasks.

4arXiv · cs.CL·16d ago·source ↗

BenHalluEval: Multi-Task Hallucination Evaluation Framework for Bengali LLMs

BenHalluEval introduces the first systematic hallucination benchmark for Bengali, covering four tasks (generative QA, code-mixed QA, summarization, reasoning) with 12,000 hallucinated candidates generated via GPT-5.4 across twelve hallucination types. Seven LLMs are evaluated under a dual-track protocol separating false-positive rate on ground-truth instances from hallucination detection rate on hallucinated candidates. The proposed BenHalluScore metric reveals substantial variation (7.72%–55.42%) across models and tasks, and chain-of-thought prompting is found to shift response distributions without consistently improving hallucination discrimination. The work highlights gaps in low-resource language hallucination evaluation and critiques single-track and prompting-only evaluation approaches.

4arXiv · cs.AI·7d ago·source ↗

SECDA-DSE: LLM-guided design space exploration for FPGA accelerator generation

SECDA-DSE is a framework that integrates LLMs into the SECDA hardware-software co-design ecosystem to automate design space exploration (DSE) of FPGA-based AI accelerators. The system combines a structured architecture candidate generator with an LLM Stack using retrieval-augmented generation and chain-of-thought prompting, plus an iterative feedback loop. Evaluation demonstrates end-to-end synthesis and execution of three accelerator designs on real FPGA hardware, with results showing the approach captures kernel-specific compute/memory trade-offs while reducing manual design effort.