Computer Science Thesis Proposal
In Person and Virtual - ET - Traffic21 Classroom, Gates Hillman 6501 and Zoom
Ph.D. Student, Computer Science Department, Carnegie Mellon University
Towards City-Scale Neural Rendering
Advances in neural rendering techniques have led to significant progress towards photo-realistic novel view synthesis. When combined with increases in data processing and compute capability, this promises to unlock numerous VR applications, from search and rescue to autonomous driving. Large-scale virtual reality, long the domain of science fiction, feels markedly more tangible.
This proposal aims to advance the frontier of large-scale neural rendering by building upon Neural Radiance Fields (NeRFs), a family of methods attracting attention due to their state-of-the-art rendering quality and conceptual simplicity. As of July 2023, at least 3,000 papers have been proposed by research groups across the world across numerous use cases. However, numerous shortcomings remain. The first is scale itself. Only a handful of existing methods capture scenes larger than a single object or room. Those that do only handle static scenes, which limits their applicability. Another is quality, as NeRF assumes ideal viewpoint conditions that are unrealistic in practice and degrades when they are violated. Renderings are especially poor in under-observed regions. This is problematic for dynamic city-scale scenes where it is impossible to densely sample every location and time step. Speed is a third issue, as rendering falls below interactive thresholds. Current acceleration methods remain too slow or degrade quality at high resolution.
To address scaling, we design a sparse network structure that specializes parameters to different regions of the scene that can be trained in parallel, allowing us to scale linearly as we increase model capacity (vs quadratically in the original NeRF). We then extend our approach to build the largest dynamic NeRF representation to date. As a first step towards improving quality, we propose an anti-aliasing method with minimal performance overhead. To accelerate rendering, we improve sampling efficiency through a hybrid surface-volumetric approach that encourages the model to represent as much of the world as possible through surfaces (which require few samples per ray) while maintaining the freedom to render transparency and finer details (which pure surface representations cannot capture). We finally propose to further improve quality in underobserved regions through diffusion models, which show promising results on single-object reconstructions.
Deva Ramanan (Chair)
Angjoo Kanazawa (University of California, Berkeley)
Jon Barron (Google Research)
In Person and Zoom Participation. See announcement.