One4D

Unified 4D Generation and Reconstruction via Decoupled LoRA Control

Zhenxing Mi, Yuxin Wang, Dan Xu
The Hong Kong University of Science and Technology (HKUST)
Accepted to ECCV 2026
One4D teaser showing single image, full video, sparse frames, and text prompt to 4D outputs

Unified Framework

One4D is a unified framework for 4D generation and reconstruction that can seamlessly transition between 4D generation from a single image, 4D reconstruction from a full video, mixed generation and reconstruction from sparse frames, and 4D generation from a text prompt via Unified Masked Conditioning (UMC). With Decoupled LoRA Control (DLC), which employs two modality-specific LoRA adapters to form decoupled computation branches for RGB frames and pointmaps, connected by lightweight, zero-initialized control links that gradually learn mutual pixel-level consistency, One4D produces high-quality RGB frames and accurate pointmaps across both generation and reconstruction tasks.

Methodology

One4D Framework

Figure 1: The One4D Unified Framework architecture.

🎛️

Unified Masked Conditioning

Enables seamless transitions between 4D generation from a single image, 4D reconstruction from a full video, mixed generation and reconstruction from sparse frames, and 4D generation from a text prompt using a single unified model.

🧩

Decoupled LoRA Control

Decouples RGB and XYZ computation to minimize interference while maintaining pixel-wise cross-modal control.

Architecture Comparison

Figure 2: Comparison of Decoupled LoRA Control against other architectures.

Results Showcase

Single image to 4D

Generating a consistent 4D scene from a single input image, ensuring high-quality RGB frames and consistent geometry for the entire 4D output.

Sparse frames to 4D

Reconstructing the 4D scene given only a few sparse frames. One4D interpolates the missing information utilizing the Unified Masked Conditioning.

Full video to 4D

High-fidelity reconstruction from a full video input, ensuring temporal consistency and accurate geometry estimation.

Text to 4D

Generating a consistent 4D scene from a pure text prompt, ensuring high-quality RGB frames and consistent geometry for the entire 4D output.

BibTeX

@inproceedings{mione4d2026,
  title={One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control},
  author={Mi, Zhenxing and Wang, Yuxin and Xu, Dan},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2026}
}