GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan

If you find GeometryCrafter useful, please help ⭐ the [Github Repo] , which is important to Open-Source projects. Thanks! [arXiv] [Project Page]
-1 280
512 2048
1 25
1 1.2
10 110
1 16
1 50

Parameters:

  • process length: only process the first process length frames for geometry estimation, -1 denotes the whole video
  • max resolution: downsample the long side to the target resolution (if exceed) before processing to save memory usage
  • num denoising steps: the number of denoising iterations, 5 is enough for most cases
  • cfg scale: recommended as the default value 1.0
  • shift window size: recommended as the default value 110
  • decode chunk size: chunk size for VAE decoder, you can set it as 4 or 6 to save memory usage
  • overlap: recommended as the default value 25
Examples
Input Video process length max resolution num denoising steps cfg scale shift window size decode chunk size overlap
Note: For time quota consideration, we set the default parameters to be more efficient here, with a trade-off of shorter video length and slightly lower quality. You may adjust the parameters according to our [Github Repo] for better results if you have enough time quota. We only provide a simplified visualization script in this page due to the lack of support for point cloud sequences. You can download the npz file and open it with Viser backend in our repo for better visualization.