person

Owain Evans

personactiveowain-evans-7918199f·1 events·first seen 28d ago

Aliases: Owain Evans

Co-occurring entities

TruthfulQA Stephanie Lin Jacob Hilton OpenAI

More like this (12)

Owkin Qwen Ethan Mollick AdamW OWL Jacob Hilton Simon Willison Harrison Edwards Adam Scott Wu Bret Taylor David Chen

Recent events (1)

6Openai Blog·28d ago·source ↗

TruthfulQA: Measuring how models mimic human falsehoods

OpenAI introduced TruthfulQA, a benchmark designed to measure whether language models generate truthful answers or mimic common human misconceptions and falsehoods. The benchmark tests models on questions where humans frequently give wrong answers due to misconceptions, conspiracy theories, or false beliefs. Results showed that larger models were not necessarily more truthful, and in some cases performed worse, highlighting a key alignment challenge.

Evaluation and Benchmarking AI Safety Research TruthfulQA Stephanie Lin Jacob Hilton +3 more