Almanac
paper

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

paperactiveprovisionalaccuracy-and-satisfaction-in-multi-turn-llm-dialogues-for-nfr-assessment-cc3052f9·1 events·first seen 30h ago

Aliases: Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.AI·30h ago·source ↗

Study finds GitHub Copilot dialogue accuracy low for HIPAA compliance NFR assessment despite high developer agreement

A controlled study with 49 programmers using GitHub Copilot to assess 148 HIPAA-derived non-functional requirements (NFRs) against a real codebase finds that developers tend to agree with LLM assessments, but accuracy against expert ground truth is low. The paper evaluates multi-turn dialogue quality across requirement satisfaction, reasoning, and code localization dimensions. User satisfaction modeling reveals that longer responses and more information-providing turns hurt satisfaction, while proactive interactions help. The work highlights a gap in current LLM evaluation benchmarks, which focus on functional correctness and single-turn accuracy rather than multi-turn NFR assessment.