Robust Neural Rendering in the Wild
with Asymmetric Dual 3D Gaussian Splatting

1McMaster University, 2Xi'an Jiaotong University

Description 1
Neural rendering in the wild

Description 2
In-the-wild inputs
Description 3
Our renderings (without distractors)

Abstract

3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts.

In this work, we propose Asymmetric Dual 3DGS, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence.

AsymGS overview
Overview of the Asymmetric Dual 3DGS framework. Two 3DGS models $\mathbb{G}_1$ and $\mathbb{G}_2$ are concurrently optimized with the reconstruction loss $\mathcal{L}_{r1}^{\mathbf{M}_h}$ and $\mathcal{L}_{r2}^{\mathbf{M}_s}$ (Eq. 4), along with the mutual consistency loss $\mathcal{L}_{m1}$ and $\mathcal{L}_{m2}$ (Eq. 6). In addition, we apply a mask loss (Eq. 7) for learning soft mask in a self-supervised manner. For improved efficiency, we also propose an EMA version of our framework by replacing $\mathbb{G}_2$ with a dynamic EMA proxy. Both the mask loss and the EMA proxy have been omitted here for clarity. Note that the color transform in this figure is for illustration purpose, which undergoes a rasterization process in practice as introduced in Section 3.1.

Results

Appearance Changing

Distractor Removal

Comparision with SOTA

BibTeX

@inproceedings{
    li2025asymgs,
    title={Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting},
    author={Chengqi Li and Zhihao Shi and Yangdi Lu and Wenbo He and Xiangyu Xu},
    booktitle={NeurIPS},
    year={2025},
}