3-5 Evaggelikis Scholis, 14231 Nea Ionia, Greece

Actors’ worst fears come true? New 4D Gaussian splatting method captures human motion

Originally posted on venturebeat.

As the Hollywood actors’ strike marches forward towards its 100th day with no resolution in sight, a technological leap has just rendered one of the actors’ biggest complaints even more possible: 3D scanning of human bodies in motion, potentially allowing for actors’ performances and mannerisms to be captured and stored as a 3D model that could be re-used by studios in perpetuity.

Although 3D scanning technology has been around in Hollywood for decades, it has typically involved a complex and time-consuming setup — multiple cameras arranged 360-degrees around an actor’s body, or, in the case of capturing motion, using ping-pong ball like “markers” placed directly on the actor and a tight-fitted bodysuit. Even recent advances using AI, such as the UK startup Move AI, generally rely on multiple cameras (though Move has a new single camera app now in limited, invitation-only release).

But now, a new method has been achieved: Gaussian splatting, a series of equations which has in recent years been used to capture static 3D imagery from a single 2D camera that is moved in a sequence around an object, has now been modified by researchers at Huawei and the Huazhong University of Science and Technology in China to capture dynamic motion in 3D as well, including human body motions.

Their method is called “4D Gaussian splatting,” because time, being the fourth dimension, is the new feature, allowing for the image to change over time.

Why motion is so tricky for Gaussian splatting

3D Gaussian splatting was devised for scanning objects with lasers in 2001 by researchers at MIT, ETH Zurich, and Mitsubishi.

It uses collections of particles to represent a 3D scene, each with its own position, rotation, and other attributes. Each point is also assigned an opacity and a color, which can change depending on the view direction. In recent years, Gaussian splatting has come a long way and can now be rendered in modern web browsers and made from a collection of 2D images on a user’s smartphone.

However, as the researchers write in a new paper published October 12 simultaneously on Github and open-access site arXiv.org, “3D-GS [Gaussian splatting] still focuses on the static scenes. Extending it to dynamic scenes as a 4D representation is a reasonable, important but difficult topic. The key challenge lies in modeling complicated point motions from sparse input.”

The main challenge is that when multiple Gaussian splatters are joined together across different timestamps to create a moving image, each point “deforms” from image to image, creating inaccurate representations of the shapes and volumes of the objects (and subjects) in the images.

However, the researchers were able to overcome this by maintaining only “one set of canonical 3D Gaussians,” or images, and used predictive analytics to map where and how they would move from one timestamp to the next.

What this looks like in practice is a 3D image of a person cooking on a pan, including chopping and stirring ingredients, as well as a dog moving nearby. Another example shows human hands breaking a cookie in half and yet another opening a toy egg to reveal a nested toy chick inside. In all cases, the researchers were able to achieve a 3D rotational effect, allowing a viewer to move the “camera” around the objects in the scene in 3D and see them from multiple angles and vantage points.

According to the researchers, their 4D Gaussian splatting method “achieves real-time rendering on dynamic scenes, up to 70 FPS at a resolution of 800×800 for synthetic datasets and 36 FPS at a resolution of 1352×1014 in real datasets, while maintaining comparable or superior performance than previous state-of-the-art (SOTA) methods.

Next steps

While the initial results are impressive, the scenes of motion captured by the researchers in 3D takes 20 minutes, and only last a few seconds each, far from the amount of time needed to cover an entire feature film, for example.

But, for studios looking to capture an actor’s few motions and re-use them, it’s a great start. And for video game designers, XR/VR designers, it’s hard to imagine that this technique will not be useful.

And, as with many promising technological advances, the quality and quantity of what can be captured — over what time frame — is only likely to increase.

As the researchers write at the end of their paper, “this work is still in progress and we will explore higher rendering quality on complex real scenes in the subsequent development.”

Source: venturebeat