Entity · benchmark

GenEval

benchmarkactivegeneval-0b75c699·4 events·first seen May 26, 2026

Aliases: GenEval

Co-occurring entities

More like this (12)

G-Eval SummEval DeepEval HumanEval SearchGen-Bench ValueEval CharacterEval ParaEval L-Eval SemEval-2010 Task 8 GIFT-Eval SearchGen-20K

Recent events (4)

4arXiv · cs.AI·Jun 24, 2026·source ↗

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Researchers propose Implicit Visual Chain-of-Thought (IV-CoT), a latent visual reasoning framework that decomposes visual conditioning queries into a structural-to-semantic cascade for text-to-image generation. The method uses training-only sketch supervision to guide structural queries without requiring sketch extraction at inference time, enabling implicit CoT reasoning in a single forward pass. IV-CoT achieves improved results on GenEval and T2I-CompBench benchmarks, targeting persistent weaknesses in multimodal LLMs around object counts, spatial relations, and attribute binding.

Evaluation and Benchmarking Multimodal Progress GenEval T2I-CompBench IV-CoT

6arXiv · cs.LG·Jun 2, 2026·source ↗

Drifting Preference Optimization (DrPO) for One-Step Text-to-Image Generators

DrPO is a new online preference fine-tuning method designed specifically for deterministic one-step text-to-image generators like SD-Turbo and SDXL-Turbo, which are difficult to align with standard RLHF methods that require policy likelihoods or differentiable reward gradients. The method samples candidates per prompt, ranks them with a target reward, and synthesizes a feature-space update direction via a non-parametric dipole preference field plus a reference drift from the frozen base model. Because the reward is used only for ranking, DrPO supports black-box and non-differentiable reward functions while keeping inference as a single forward pass. Evaluations on HPSv3 and GenEval show improved alignment over reward-gradient-free baselines and a 3.51× reduction in training compute by eliminating reward-model backpropagation.

Inference Economics Alignment and RLHF SDXL Turbo HPSv3 GenEval +4 more

5The Batch·May 29, 2026·source ↗

Meta Research Improves Image Generation via Staged Planning and Self-Revision Fine-Tuning

Researchers from Meta and collaborating universities propose a fine-tuning method that teaches image generators to compose images through discrete plan-sketch-inspect-refine cycles rather than generating all at once. Starting from BAGEL-7B, they construct ~62,000 training examples using GPT-4o and FLUX.1 Kontext to supervise each stage, achieving 83% on GenEval versus 77% for the base model and a competing method (PARM) that required 11x more training data and ~8x more inference steps. The approach improves spatial relationship accuracy, object attribute fidelity, and real-world knowledge grounding in generated images.

Evaluation and Benchmarking Agent and Tool Ecosystem University of California San Diego WISE FLUX.1 Kontext +10 more

6arXiv · cs.AI·May 26, 2026·source ↗

Channel-wise Vector Quantization (CVQ): A New Image Tokenization Paradigm with Next-Channel Prediction

Researchers introduce Channel-wise Vector Quantization (CVQ), which replaces conventional patch-wise discrete tokens with channel-wise tokens that represent an image as discrete levels of visual detail. Built on CVQ, the Channel-wise Autoregressive (CAR) model uses a 'next-channel prediction' objective, generating images by progressively refining from global structure to fine-grained attributes. CVQ achieves 100% codebook utilization with a 16K+ codebook and the CAR model scores 86.7 on DPG and 0.79 on GenEval for text-to-image generation. The approach offers a structural alternative to raster-order patch-based autoregressive image generation.

Frontier Model Releases Evaluation and Benchmarking Channel-wise Vector Quantization DPG Benchmark GenEval +4 more