🌓 InvSplat

Inverse Feed-Forward Scene Splatting

1University of Tübingen, Tübingen AI Center  2ETH Zurich
InvSplat teaser: from posed input images to a 3D Gaussian scene with materials, enabling novel view synthesis and relighting

InvSplat reconstructs 3D scene geometry and material parameters from posed images in real time, enabling novel view synthesis and relighting.

Abstract

Inverse rendering aims to recover both 3D geometry and physically meaningful material properties from images, enabling applications such as relighting and novel view synthesis. Optimization-based methods achieve high fidelity but require costly per-scene fitting, while image-space learning-based approaches often suffer from multi-view inconsistencies and lack an explicit 3D representation for stable novel view rendering.

We present a feed-forward multi-view reconstruction framework for inverse rendering that directly predicts a structured 3D Gaussian representation with intrinsic material attributes. Each Gaussian primitive is parameterized by mean, normal, opacity, rotation, scale, albedo, metallic, and roughness, enabling a disentangled and physically grounded scene representation. Our model integrates priors from a material estimation network with a multi-view 3D reconstruction backbone, allowing joint prediction of geometry and reflectance parameters in a single forward pass.

Experiments on synthetic and real-world datasets demonstrate improved multi-view consistency compared to 2D baselines, accurate material recovery, and stable novel view rendering. Our representation further supports physically-based relighting and more faithful modeling of view-dependent effects compared to existing RGB-based feed-forward reconstruction methods.

Method

Given N posed images, a single feed-forward network fθ({Ii}, {Pi}) → (𝒢, ℳ) jointly recovers scene geometry 𝒢 and intrinsic materials ℳ via a dual-branch design.

InvSplat architecture overview

A Geometry branch (ResNet → multi-view geometry encoder → feature matching / cost volume) and an Intrinsic branch (DINOv2 → multi-view intrinsic translator) feed a shared set of decoding heads. The heads predict depth, Gaussian normals, rotation/scale/opacity, and material maps (albedo, metallic, roughness), which are unprojected into a 3D Gaussian scene and rendered differentiably. Training supervises all rendered properties with L1 + LPIPS, an affine-invariant depth loss, and a cosine normal loss.

① Geometry branch

Frozen ResNet feature pyramid; a transformer geometry encoder does cross-view attention, and feature matching builds a depth-candidate cost volume C.

② Intrinsic branch

DINOv2 ViT-L/14 encodes each view; a 36-block translator alternates intra- and inter-view attention to produce material features Fm.

③ Decoding

Six DPT / Point-Transformer heads predict depth, normals, rotation/scale/opacity, and albedo/metallic/roughness; unprojection lifts them into 3D Gaussians.

④ Rendering & loss

A single differentiable rasterization pass renders material, normal, and depth maps for supervision and novel views.

NVS Results

From small number of input views, InvSplat reconstructs a 3D Gaussian scene and renders smooth novel-view trajectories. Slide through the results on each dataset.

InteriorVerse (2 views)
RE10K (2 views)

Explore in 3D

Input views

Video consistency comparison (Re10k)

We compare predicted intrinsics against 2D image-space methods on 32-frames videos from Re10k dataset. For our method we do scene reconstruction from first and last frame of the camera path, and render the view trajectory as in video. Baselines we run on whole input video sequences. Because our scene is a single 3D Gaussian field, predictions stay multi-view consistent across the camera path, while image-space baselines flicker on highlights and reflective surfaces.

Scene
Map
Input
Ours (3D) InvSplat
MVInverse (2D)
DiffusionRenderer (2D)
0.0s

Relighting

Because each Gaussian carries physically-based materials and a normal, we can render the scene under different lighting conditions. We insert a point light source to scenes.

Input views

Camera move
Light move
All roughness
All metallic

BibTeX

@inproceedings{invsplat2026,
  title     = {InvSplat: Inverse Feed-Forward Scene Splatting},
  author    = {Karpikova, Polina and Bian, Wenjing and Xu, Haofei and Lensch, Hendrik P. A. and Geiger, Andreas},
  booktitle = {Arxiv},
  year      = {2026},
  eprint        = {2607.02301},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}