Entity · benchmark

AlpacaEval 2

benchmarkactivealpacaeval-2-957808fc·3 events·first seen May 18, 2026

Aliases: AlpacaEval 2, AlpacaEval 2.0, AlpacaEval2

Merged from

AlpacaEval2

Co-occurring entities

MT-Bench Llama3-8B-Instruct WildBench General Preference Reinforcement Learning SimPO SPPO Arena-Hard General Preference Model Meta-Llama-3-70B Alibaba Qwen1.5-110B StruQ SecAlign Berkeley AI Research (BAIR)Instruction Hierarchy Direct Preference Optimization (DPO)OpenAI Sizhe Chen

More like this (12)

CyberSecEval 2 Llama 2 AlphaFold2 OpenAI Evals HumanEval HypoEval ProActEval ParaEval olmo-eval SemEval-2010 Task 8 Community Evals NPHardEval

Recent events (3)

7arXiv · cs.CL·May 19, 2026·source ↗

General Preference Reinforcement Learning (GPRL): Bridging Online RL and Preference Optimization for Open-Ended Tasks

GPRL proposes a new alignment framework that replaces scalar reward models with a General Preference Model (GPM) embedding responses into k skew-symmetric subspaces to capture multi-dimensional, intransitivity-aware preferences. The method computes per-dimension group-relative advantages, normalizes across axes, and uses a closed-loop drift monitor to detect and correct single-axis reward hacking during training. Starting from Llama-3-8B-Instruct, GPRL achieves a 56.51% length-controlled win rate on AlpacaEval 2.0 and outperforms SimPO and SPPO on Arena-Hard, MT-Bench, and WildBench. The work directly addresses the gap between verifiable-reward online RL (strong on math/code) and preference optimization (strong on open-ended tasks).

Frontier Model Releases Evaluation and Benchmarking WildBench MT-Bench General Preference Reinforcement Learning +7 more

7Qwen Research·May 18, 2026·source ↗

Qwen1.5-110B: Alibaba Releases First 100B+ Model in Qwen1.5 Series

Alibaba's Qwen team released Qwen1.5-110B, their first open-weights model exceeding 100 billion parameters. The model claims comparable performance to Meta's Llama-3-70B on base model benchmarks, with strong results on MT-Bench and AlpacaEval 2 chat evaluations. The release follows a wave of large open-source models exceeding 100B parameters from various organizations.

Frontier Model Releases Evaluation and Benchmarking MT-Bench Meta-Llama-3-70B Alibaba +3 more

6Berkeley Ai Research (Bair) Blog·May 18, 2026·source ↗

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Researchers from BAIR propose two fine-tuning-based defenses against prompt injection attacks: StruQ (Structured Instruction Tuning) and SecAlign (Special Preference Optimization). Both methods use a Secure Front-End with special delimiter tokens to separate trusted prompts from untrusted data, then fine-tune LLMs to ignore injected instructions. SecAlign, which uses DPO-style preference optimization, reduces attack success rates to under 15% against strong optimization-based attacks—more than 4x better than prior SOTA—while preserving model utility on AlpacaEval2.

AI Safety Research Agent and Tool Ecosystem StruQ SecAlign Berkeley AI Research (BAIR)+7 more