Almanac
product

iTrust

productactiveprovisionalitrust-1ce1df3a·1 events·first seen 31h ago

Aliases: iTrust

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.AI·31h ago·source ↗

Study finds GitHub Copilot dialogue accuracy low for HIPAA compliance NFR assessment despite high developer agreement

A controlled study with 49 programmers using GitHub Copilot to assess 148 HIPAA-derived non-functional requirements (NFRs) against a real codebase finds that developers tend to agree with LLM assessments, but accuracy against expert ground truth is low. The paper evaluates multi-turn dialogue quality across requirement satisfaction, reasoning, and code localization dimensions. User satisfaction modeling reveals that longer responses and more information-providing turns hurt satisfaction, while proactive interactions help. The work highlights a gap in current LLM evaluation benchmarks, which focus on functional correctness and single-turn accuracy rather than multi-turn NFR assessment.