PIXLRelight: Controllable Single-Image Relighting via Intrinsic Conditioning
PIXLRelight is a feed-forward method for physically controllable single-image relighting that bridges physically based rendering (PBR) and learned image synthesis through shared intrinsic conditioning. At training time, multi-illumination photographs are decomposed into albedo, diffuse shading, and non-diffuse residuals; at inference time, conditioning is derived from a path-traced render of a coarse 3D reconstruction under user-specified PBR lights. A transformer-based neural renderer applies target illumination via per-pixel affine modulation, achieving state-of-the-art quality in under 100ms per image. Code and models are publicly released.
Related guides (2)
Related events (8)
Pose-ICL: 3D-aware in-context learning for pose-controllable image generation of custom subjects
Researchers introduce Pose-ICL, a tuning-free framework for generating images of user-specified subjects with accurate pose control. The method uses Surface-Anchored Position Embedding (SAPE) to give 2D diffusion models explicit 3D awareness by anchoring image tokens to volumetric bounding box surface coordinates. Evaluations on 3D assets and real-world subjects show improvements over existing methods in both pose accuracy and identity consistency. The framework is designed for compatibility with existing Diffusion Transformer (DiT) models.
Instruction-tuning Stable Diffusion with InstructPix2Pix
This Hugging Face blog post describes a methodology for instruction-tuning Stable Diffusion using the InstructPix2Pix framework, enabling image editing via natural language instructions. The approach adapts techniques from language model instruction-tuning to the image generation domain. The post covers dataset construction, training procedures, and evaluation of the resulting models.
Efficient Controllable Generation for SDXL with T2I-Adapters
Hugging Face published a blog post detailing T2I-Adapters for Stable Diffusion XL (SDXL), a lightweight conditioning mechanism that enables controllable image generation without full fine-tuning. The approach allows users to guide SDXL outputs using structural signals such as depth maps, edge detection, and pose estimation. T2I-Adapters offer a parameter-efficient alternative to ControlNet for the SDXL architecture, with integration into the Diffusers library.
Illumination-robust rPPG heart-rate estimation via spatial-temporal transformer for robot-mounted cameras
A new arXiv paper presents an end-to-end spatial-temporal transformer framework for remote photoplethysmography (rPPG) heart-rate estimation that is robust to illumination variation, targeting robot-mounted RGB cameras. The system integrates 3D face alignment, illumination augmentation, a Residual Temporal Standardization Module, and a hybrid waveform-plus-spectral loss. On a new dataset spanning three illumination levels, the method achieves 0.79 bpm MAE and 0.982 HR correlation, reducing error by 93.6% relative to the PhysFormer baseline. The work is relevant to physiological sensing in service and assistive robotics.
Glow: Better reversible generative models
OpenAI introduces Glow, a reversible generative model using invertible 1x1 convolutions that extends prior work on normalizing flows. The model generates realistic high-resolution images, supports efficient sampling, and learns disentangled features for attribute manipulation. Code and an online visualization tool are released alongside the paper.
ETCHR: Decoupled Image Editing for Visual Chain-of-Thought Reasoning in MLLMs
ETCHR introduces a question-conditioned, reasoning-aware image editing model that decouples visual transformation from downstream understanding in multimodal LLMs. It addresses two identified gaps—language-side (mapping abstract questions to visual edits) and generation-side (edit quality degrading with reasoning depth)—via a two-stage training recipe combining supervised fine-tuning on edit trajectories and VLM-derived reward signals. Because the editor is decoupled, it plugs into arbitrary MLLMs without retraining, yielding Pass@1 gains of roughly +4.6 to +5.5 points across five task families when paired with Qwen3-VL-8B, Gemini-3.1-Flash-Lite, and Kimi K2.5. The work advances the 'think with images' paradigm beyond fixed toolkits and unified multimodal approaches.
Representation-Conditioned Diffusion Models for Controllable Image Generation
This paper explores conditioning diffusion models on representations from pre-trained self-supervised models as an alternative to text prompts or semantic maps, which require large annotated datasets. The self-conditioning mechanism improves unconditional image generation quality and provides a controllable representation space. The authors identify directions of variation in this space and demonstrate smoothness and disentanglement properties, suggesting potential for fine-grained generative control without heavy annotation overhead.
SymbolicLight V1: Spike-Gated Dual-Path Language Model with High Activation Sparsity
SymbolicLight V1 is a 194M-parameter spiking language model that combines binary Leaky Integrate-and-Fire spike dynamics with a continuous residual stream, replacing dense self-attention with a dual-path module using exponential-decay aggregation and spike-gated local attention. Trained from scratch on a 3B-token Chinese-English corpus, it achieves validation perplexity of 8.88–8.93 at over 89% per-element activation sparsity, trailing GPT-2 201M by 7.7% in PPL. Ablations indicate that temporal integration via LIF dynamics contributes more to performance than sparsity alone, and a 0.8B-parameter scale-up on 48.8B tokens demonstrates optimization stability. Current dense-hardware inference is slower than GPT-2; neuromorphic deployment is framed as a future opportunity.

