Almanac
person

Owain Evans

personactiveowain-evans-7918199f·1 events·first seen 28d ago

Aliases: Owain Evans

Co-occurring entities

More like this (12)

Recent events (1)

6Openai Blog·28d ago·source ↗

TruthfulQA: Measuring how models mimic human falsehoods

OpenAI introduced TruthfulQA, a benchmark designed to measure whether language models generate truthful answers or mimic common human misconceptions and falsehoods. The benchmark tests models on questions where humans frequently give wrong answers due to misconceptions, conspiracy theories, or false beliefs. Results showed that larger models were not necessarily more truthful, and in some cases performed worse, highlighting a key alignment challenge.