4One Useful Thing (Ethan Mollick)·1mo ago

Giving your AI a Job Interview

This commentary piece argues that as AI-generated advice becomes more consequential, users need systematic methods to evaluate AI reliability and quality—analogous to a job interview process. The author proposes frameworks for assessing AI outputs before trusting them for important decisions. The piece addresses the practical challenge of calibrating trust in AI systems across different use cases.

Evaluation and Benchmarking Enterprise Deployment Patterns Ethan Mollick One Useful Thing

Related guides (2)

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4One Useful Thing·1mo ago·source ↗

Real AI Agents and Real Work

A commentary piece from One Useful Thing examining the practical deployment of AI agents in real work contexts, framing the tension between human-centered work and AI-generated productivity outputs. The piece appears to analyze how autonomous AI agents are changing knowledge work workflows. Published by a Tier 2 source known for applied AI analysis aimed at practitioners and researchers.

Enterprise Deployment Patterns Agent and Tool Ecosystem One Useful Thing

3One Useful Thing·1mo ago·source ↗

Making AI Work: Leadership, Lab, and Crowd

This commentary from One Useful Thing proposes a framework for organizational AI adoption centered on three elements: leadership commitment, structured experimentation (lab), and distributed employee engagement (crowd). The piece offers practical guidance for companies navigating AI integration. As a tier-2 commentary source, it reflects practitioner thinking on enterprise AI deployment patterns rather than reporting new technical developments.

Enterprise Deployment Patterns Ethan Mollick One Useful Thing

6Openai Blog·1mo ago·source ↗

AI Safety via Debate

OpenAI proposes a safety technique in which two AI agents debate a topic and a human judge determines the winner, with the goal of making it easier for humans to supervise AI systems that may be more capable than themselves. The core intuition is that it is easier to verify a correct argument than to generate one, so a dishonest agent can be caught by an honest opponent. The paper introduces debate as a scalable oversight mechanism applicable to complex tasks where direct human evaluation is infeasible.

Evaluation and Benchmarking AI Safety Research AI Safety via Debate Debate (AI safety technique)OpenAI +2 more

4One Useful Thing·1mo ago·source ↗

An Opinionated Guide to Using AI Right Now

A tier-2 commentary piece from One Useful Thing offering opinionated guidance on which AI tools to use in late 2025. The piece likely surveys the current landscape of frontier models and recommends specific tools for specific tasks. As a practitioner-facing guide, it reflects the state of the AI tooling ecosystem as perceived by an influential commentator.

Enterprise Deployment Patterns Agent and Tool Ecosystem One Useful Thing

6arXiv · cs.CL·25d ago·source ↗

AI-Assisted Systematization for Evaluating GenAI Systems

This paper addresses a foundational gap in GenAI evaluation: the underspecification of broad, contested concepts like 'reasoning,' 'fairness,' or 'creativity.' The authors introduce a structured artifact called a 'concept spec' and a validation worksheet, then build two AI-assisted systematizers—a zero-shot approach and a multi-agent approach—to convert vague evaluation targets into measurable, structured accounts. They apply these tools to hate-based rhetoric and digital empathy, assessing the resulting specs on content validity and information recoverability. The work positions AI assistance as a scalable aid for the cognitively demanding process of evaluation design.

Evaluation and Benchmarking AI Safety Research hate-based rhetoric concept spec digital empathy +4 more

4One Useful Thing·1mo ago·source ↗

A Guide to Which AI to Use in the Agentic Era

A tier-2 commentary piece from One Useful Thing offering guidance on selecting AI systems in the current agentic era, signaling a shift in framing from chatbots to agents as the primary use-case paradigm. The piece appears to survey the landscape of available AI tools and their appropriate applications. As a practitioner-oriented guide, it reflects the growing complexity of the AI tooling ecosystem as agentic capabilities proliferate.

Enterprise Deployment Patterns Agent and Tool Ecosystem Ethan Mollick One Useful Thing

5Ai Snake Oil·1mo ago·source ↗

New Paper: Towards a Science of AI Agent Reliability

A new paper proposes a framework for quantifying the gap between AI agent capability and reliability, aiming to establish a more rigorous science of agent dependability. The work addresses the observation that agents may demonstrate high capability on benchmarks while failing unpredictably in deployment. The piece is published via the normaltech.ai newsletter, associated with the AI Snake Oil research commentary tradition.

Evaluation and Benchmarking AI Safety Research Towards a Science of AI Agent Reliability normaltech.ai AI Snake Oil +2 more

3One Useful Thing·1mo ago·source ↗

On Working with Wizards

A commentary piece from One Useful Thing exploring the metaphor of AI systems as 'wizards' and the challenge of working with them on the 'jagged frontier' of capabilities. The piece likely addresses how users can effectively verify and leverage AI outputs given the uneven and unpredictable nature of current model capabilities. As a tier-2 commentary source, it offers practitioner-level perspective on human-AI collaboration patterns.

Evaluation and Benchmarking Enterprise Deployment Patterns jagged frontier Ethan Mollick One Useful Thing