DynMF: Neural Motion Factorization
for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Abstract

overview

Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios.

Overview Video

Method

Overview of DynMF: The underlying dense motion field (left-top) of a dynamic scene is factorized into a set of globally shared learnable motion basis (left-bottom) and their motion coefficients stored on each Gaussian (right-bottom). Given a query time t, the deformation can be efficiently computed via a single global forward of the motion basis and the motion coefficient blending (middle- bottom) to recover the deformed scene (top-right).

Trajectory tracking

We manage to expressively model scene element deformation of complex dynamics, through a simple and interpretable framework. Our method enables robust per-point tracking, overcoming displacement ambiguities, overlappings, and non-rigid complex motions.

Motion Decomposition

A key component of our proposed representation is its ability to explicitly decompose each dynamic scene to its core independent motions. Specifically, by applying a sparsity loss, we enforce each Gaussian to choose only one of the few trajectories available. This design combined with the inherent rigid property of our representation drives all nearby Gaussians to choose the same and only trajectory. This strongly increases the controllability of the dynamic scene, by disentangling motions, allowing for novel scene creation, interactively choosing which part of the scene is moving, and so on. Notice how only the blue or the green ball is moving in the 'Bouncingballs' scene or only the left or the right hand is moving in the 'Mutant' D-NeRF scene.

Motion editing

Being able to efficiently factorize all the motions of a dynamic scene into a few basis trajectories, allows us to control these trajectories, enable or disable them, leading to new ways of video editing. Here, we demo that in the original rendering of the 'Flame steak' DyNeRF scene the window blinds are moving. With our motion decomposition framework, we can isolate this movement, disable it, and have the original dynamic rendered scene without such potentially unwanted background movement.

Visualization of basis trajectories

This figure depicts the learned trajectories that can model the dynamics of a scene. Specifically, each Gaussian can choose one or more of these 10 trajectories to model its unique motion in the dynamic scene. When we allow a linear combination of the 10 trajectories, then the basis functions are uniformly spread in the 3D world (left video). This is because each Gaussian can model its unique motion by linearly combining these motions, which lets it uniformly move in the space of the dynamic scene. If we restrict each Gaussian to choose only one trajectory, then these become more odd and specific to the motion needs of the corresponding scene (right video).

Dynamic Rendering Results

Citation