technique
Plausibility Evaluation
techniqueactiveprovisional
plausibility-evaluation-92633037·1 events·first seen 16d agoAliases: Plausibility Evaluation
Co-occurring entities
More like this (12)
Faithfulness EvaluationCranfield evaluation paradigmProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM AgentsPragmatic Reasoningfalse-premise detectionFormal Proof SearchEvaluation Cards: An Interpretive Layer for AI Evaluation ReportingDoes Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Modelspolitical bias evaluationEvaluation on the HubCausally Evaluating the Learnability of Formal Language Tasksfrontier model evaluation
Recent events (1)
Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection
This paper investigates human disagreement in token-level rationale annotations for hate speech detection, a dimension less studied than label disagreement. The authors unify diverse models, training strategies, loss functions, and evaluation metrics under a single protocol, systematically comparing hard and soft label/rationale representation spaces. Results show that both hard and soft metrics favor softer representations, suggesting that soft supervision better captures human reasoning variation in subjective NLP tasks. The work calls for rethinking evaluation frameworks for classification and explainability in subjective NLP.