4arXiv cs.CL (Computation and Language)·37h ago

UnBias-Plus: open-source toolkit for bias detection, explanation, and rewriting in text

UnBias-Plus is an open-source toolkit that unifies segment-level multi-class bias classification, biased span localization, neutral text rewriting, and reasoning explanations for each decision. It targets bias in both human-written and AI-generated content across journalism, education, and AI research domains. The toolkit is available via Python, CLI, REST API, and web interfaces, with models and datasets publicly released.

Evaluation and Benchmarking AI Safety Research UnBias-Plus

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4The Batch·1mo ago·source ↗

Abeba Birhane on Bias in Web-Scraped Training Datasets

Researcher Abeba Birhane examines how large-scale web-scraped datasets used to train trillion-parameter NLP and vision models propagate bias and antisocial content. The commentary highlights that performance gains in deep neural networks come alongside inherited societal biases from web training data. Two posts from The Batch summarize her work on cleaning up web datasets and the specific mechanisms by which NLP models absorb web-sourced biases.

Evaluation and Benchmarking AI Safety Research DeepLearning.AI Abeba Birhane The Batch

6Openai Blog·1mo ago·source ↗

Defining and Evaluating Political Bias in LLMs

OpenAI has published a post describing their methodology for evaluating political bias in ChatGPT, introducing new real-world testing approaches aimed at improving objectivity and reducing bias. The piece outlines how OpenAI defines political bias in the context of large language models and the evaluation frameworks they are developing to measure it. This represents OpenAI's public commitment to systematic bias measurement as a component of responsible deployment.

Evaluation and Benchmarking AI Safety Research political bias evaluation ChatGPT OpenAI +1 more

4Hugging Face Blog·1mo ago·source ↗

Evaluating Language Model Bias with 🤗 Evaluate

This Hugging Face blog post introduces tooling and methodology for evaluating bias in language models using the Evaluate library. It covers bias measurement approaches and how practitioners can apply them to assess fairness properties of LLMs. The post is oriented toward applied practitioners working with open-source models.

Evaluation and Benchmarking AI Safety Research Hugging Face Evaluate Hugging Face

6arXiv · cs.CL·9d ago·source ↗

Computational audit finds ClinicalBERT amplifies demographic bias beyond training data distributions

Researchers present a systematic audit of representational bias in ClinicalBERT, a BERT-based model pretrained on MIMIC-III clinical discharge summaries, using two probing methodologies: Log Probability Bias Analysis and Masked Language Model probing across 98 clinical sentence templates and eight intersectional race-gender combinations. Of 32 statistically significant findings, 65.6% contradict observed corpus distributions, rising to 80% for Black patients and 87.5% for agency attribution under MLM probing. The key finding is that bias in ClinicalBERT operates predominantly through model-internal amplification rather than simple inheritance from training data, which has direct implications for clinical AI safety and deployment. This challenges the assumption that auditing training corpora is sufficient to characterize model bias.

Evaluation and Benchmarking AI Safety Research A Computational Audit of Demographic Association Encoding in ClinicalBERT Language Predictions MIMIC-III ClinicalBERT +1 more

4Hugging Face Blog·1mo ago·source ↗

Ethics and Society Newsletter #4: Bias in Text-to-Image Models

Hugging Face's Ethics and Society team publishes their fourth newsletter focusing on bias in text-to-image generative models. The piece examines how these models encode and reproduce societal biases in visual outputs, likely covering evaluation methods, documented failure modes, and mitigation approaches. As a Tier 2 commentary piece from a major ML platform, it contributes to ongoing discourse around fairness and safety in multimodal AI systems.

Evaluation and Benchmarking AI Safety Research Hugging Face Ethics and Society Team text-to-image models Hugging Face +1 more

6arXiv · cs.CL·19h ago·source ↗

Unified framework reveals systematic bias amplification in comparative LLM evaluation settings

A new arXiv paper introduces a unified framework for standardizing social bias benchmarks across isolated and forced-choice comparative evaluation settings. The study finds a large 'paradigm gap': comparative settings act as aggressive catalysts for latent discrimination compared to isolated assessments, and Chain-of-Thought reasoning exacerbates this effect rather than mitigating it. Critically, this comparative bias persists even when models are given neutral fallback options or claim to answer randomly, and scales positively with model size. The authors recommend comparative settings for auditing but warn practitioners against using comparative deployments in ambiguous real-world tasks.

Evaluation and Benchmarking AI Safety Research To Compare, or Not to Compare: On Methodological Practices in Evaluating Social Bias Chain-of-Thought Reasoning

5arXiv · cs.LG·19d ago·source ↗

OpAI-Bench: Benchmark for detecting AI text across progressive human-AI co-editing workflows

Researchers introduce OpAI-Bench, a benchmark for studying AI-text detection across progressive human-to-AI document revision workflows, covering document, sentence, token, and span granularities. Starting from human-written documents, the benchmark constructs nine sequentially revised versions per sample under five AI edit operations and varying AI coverage levels across four domains. Key findings include that mixed-authorship intermediate versions are often harder to detect than fully human or heavily AI-edited endpoints, revealing non-monotonic detection patterns absent from existing benchmarks. The work addresses a gap in AI-text detection research as real-world documents increasingly result from iterative human-AI co-editing rather than pure generation.

Evaluation and Benchmarking AI Safety Research VILA-Lab OpAI-Bench

7arXiv · cs.CL·29d ago·source ↗

Automated Benchmark Auditing for AI Agents and Large Language Models (ABA)

The paper introduces Auto Benchmark Audit (ABA), an agentic framework that systematically audits AI benchmark tasks for issues such as ambiguous specifications, environment conflicts, and incorrect ground truths. Applied to 168 benchmarks across nine domains including NeurIPS publications, ABA identifies critical issues in over 25.7% of evaluated tasks. The authors demonstrate that filtering out flawed tasks materially shifts model rankings and improves average performance on SWE-bench Verified and Terminal-Bench 2 by 9.9% and 9.6% respectively, indicating that current benchmark scores are significantly distorted by task quality problems. The agentic tool and annotations are released publicly.

Frontier Model Releases Evaluation and Benchmarking NeurIPS Auto Benchmark Audit (ABA)SWE-Bench Verified +2 more