Topologically-Aware Deformation Fields for
Single-View 3D Reconstruction

Carnegie Mellon University
CVPR 2022


TARS is a state of the art single-view image reconstruction system, which can be trained on unpaired image collections from internet (given camera poses). Alonside generating high-fidelity shapes, it produces dense 3D (topologically-aware) correspondence field for free (without any direct correspondence supervision).

Abstract

We present a framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections and their poses without any 3D supervision.

Generally, image collections on the internet contain several intra-category geometric and topological variations, for example, different chairs can have different topologies, which makes the task of joint shape and correspondence estimation much more challenging. Because of this, prior works either focus on learning each 3D object shape individually without modeling cross-instance correspondences or perform joint shape and correspondence estimation on categories with minimal intra-category topological variations.

We overcome these restrictions by learning a topologically-aware implicit deformation field that maps a 3D point in the object space to a higher dimensional point in the category-specific canonical space. At inference time, given a single image, we reconstruct the underlying 3D shape by first implicitly deforming each 3D point in the object space to the learned category-specific canonical space using the topologically-aware deformation field and then reconstructing the 3D shape as a canonical signed distance field. Both canonical shape and deformation field are learned end-to-end in an inverse-graphics fashion using a learned recurrent ray marcher (SRN) as a differentiable rendering module. Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets: ShapeNet, Pascal3D+, CUB, and Pix3D chairs.

Single Image 3D Reconstruction: CUBS-200-2011


Input Image

CMR

SDF-SRN

TARS (Ours)

Input Image

CMR

SDF-SRN

TARS (Ours)






Single Image 3D Reconstruction: Pix3D chairs (trained on Shapenet)


Input Image

SDF-SRN

TARS (Ours)

Input Image

SDF-SRN

TARS (Ours)






Single Image 3D Reconstruction: Pascal3D chairs


-->

Method Overview


Overview of TARS: Given a single image, we first map a 3D point in object space to a higher-dimensional canonical space using our learned topologically-aware deformation field. The canonical point is then mapped to its SDF value using the Canonical Shape Generator module. We leverage an LSTM-based differentiable renderer to guide the learning of deformation and signed distance fields.

Video

BibTeX

@article{duggal2022tars3D,
  author    = {Duggal, Shivam and Pathak, Deepak},
  title     = {Topologically-Aware Deformation Fields for Single-View 3D Reconstruction},
  journal   = {CVPR},
  year      = {2022},
}

Acknowledgements

We would like to thank Shamit Lal, Jason Zhang, Alex Li, Ananye Agarwal for feedback on the paper and Chen-Hsuan Lin for providing details on the SoftRas baseline and the Pascal3D dataset. We are grateful to Ankit Ramchandani for help before the deadline by painting chair meshes for texture transfer experiment. This work is supported by DARPA Machine Common Sense program.