The method is called inverse rendering and uses AI to approximate how light behaves in the real world to turn 2D images taken at different angles into 3D scenes.
Nvidia's researchers applied their novel approach to a popular new technology called neural radiance fields (NeRF). Nvidia calls its product Instant NeRF and claims it is the fastest NeRF technique to in the West and can do what it is supposed to do more than 1,000 times faster than some methods.
The neural model used takes just seconds to train on a few dozen still photos though it also requires data on the camera angles they were taken from.
VP for graphics research at Nvidia, David Luebke said:
“If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene. In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.”
Using neural networks, NeRFs can render realistic 3D scenes based on an input collection of 2D images. However, the most interesting part is how the neural networks used to create them are able to fill in the blanks between the 2D images even when the objects or people in them are blocked by obstructions.
Normally, creating a 3D scene with traditional methods can take a few to several hours depending on the complexity and resolution of the visualization. By bringing AI into the picture though, even early NeRF models were capable of rendering crisp scenes without artifacts in a few minutes after being trained for several hours.
Luebke claimed that Nvidia's Instant NeRFs are able to cut down the required rendering time by several orders of magnitude by using a technique developed by the company called multi-resolution hash grid encoding that has been optimised to run efficiently on Nvidia GPUs.
He said that Instant NeRF technology could be used to quickly create avatars or scenes for virtual worlds, to capture video conferencing participants and their environments in 3D or to reconstruct scenes for 3D digital maps.
The techy could train robots and self-driving cars so that they better understand the size and shape of real-world objects by capturing 2D images or video footage.
Nvidia's said its boffins were exploring how their new input encoding technique could be used to accelerate various AI challenges such as reinforcement learning, language translation and general-purpose deep learning algorithms.