HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models

1Northeastern University, 2Stability AI
(* denotes equal contribution)
HouseCrafter automatically converts a floorplan to a 3D house.

Abstract

We introduce HouseCrafter, an novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g. a house).

Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise manner along sampled locations based on the floorplan, where previously generated images are used as condition to the diffusion model to produce images at nearby locations. The global floorplan and attention design in the diffusion model ensures the consistency of the generated images, from which a 3D scene can be reconstructed.

Through extensive evaluation on the 3D-Front dataset, we demonstrate that HouseCraft can generate high-quality house-scale 3D scenes. Ablation studies also validate the effectiveness of different design choices. We will release our code and model weights.

Method

Our method first generate multi-view 2D observations of the scene and then reconstruct it in 3D. We train a diffusion model that can perform novel-view synthesis with RGBD images. Specifically, the model takes multiple of RGBD images from nearby locations, as well as an encoded floorplan as condition, and outputs a batch of RGBD images that are consistent with each other and with the conditions.

The core of our method is a 2D diffusion model that can generate consistent multi-view RGBD images of a scene. Our model architecture is inspired by designs of SOTA object-centric novel view synthesis models, but re-designed for the geometric and semantic complexity of scene-level contents. First, we change both the reference conditioning and image generation to the RGB-D setting instead of RGB only as RGB-D images provide strong cues for 3D scene reconstruction. Second, we insert a layout attention layer at the beginning of each unet block to encourage the generated images to be faithful to the floorplan, ensuring global consistency in generating a house-scale scene. Moreover, the cross-attention layer, which is introduced in prior works for reference-novel view attention, is updated to leverage geometry from the reference depth, leading to higher-quality image generation. Please refer to our paper for more details.

More Results

Inside of the house

We show the scene's appearance and geometry from inside the house.

In-scene Rendering

Geometical Interaction

Editability

Users can easily edit the layout of the scene by moving furnitures on the floorplan. The generated scene changes correspondingly.

Original Layout

Description of the image

Generated Mesh

Edited Layout(Moving sofa on the floorplan)

Description of the image

Generated Mesh

BibTeX

@misc{nguyen2024housecrafterliftingfloorplans3d,
      title={HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model}, 
      author={Hieu T. Nguyen and Yiwen Chen and Vikram Voleti and Varun Jampani and Huaizu Jiang},
      year={2024},
      eprint={2406.20077},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.20077}, 
}