GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan

If you find GeometryCrafter useful, please help ⭐ the [Github Repo] , which is important to Open-Source projects. Thanks! [arXiv] [Project Page]

Input Video

Disparity

Point Map 0

Point Map 1

Point Map 2

Point Map 3

Point Map 4

Point Map 5

Parameters:

process length: only process the first process length frames for geometry estimation, -1 denotes the whole video
max resolution: downsample the long side to the target resolution (if exceed) before processing to save memory usage
num denoising steps: the number of denoising iterations, 5 is enough for most cases
cfg scale: recommended as the default value 1.0
shift window size: recommended as the default value 110
decode chunk size: chunk size for VAE decoder, you can set it as 4 or 6 to save memory usage
overlap: recommended as the default value 25

Examples

Input Video	process length	max resolution	num denoising steps	cfg scale	shift window size	decode chunk size	overlap

Note: For time quota consideration, we set the default parameters to be more efficient here, with a trade-off of shorter video length and slightly lower quality. You may adjust the parameters according to our [Github Repo] for better results if you have enough time quota. We only provide a simplified visualization script in this page due to the lack of support for point cloud sequences. You can download the npz file and open it with Viser backend in our repo for better visualization.