
Global streaming service Netflix has introduced VOID, an open-source framework designed to remove objects from video while preserving the physical interactions they create, addressing limitations seen in traditional inpainting and object-erasing tools.
Historically, removing an object from a scene has been straightforward, but ensuring the environment behaves realistically afterward has posed significant challenges. For instance, deleting a person holding a guitar leaves the instrument suspended unnaturally, and removing a diver from a pool can leave the water unmoved. Visual effects teams have traditionally corrected such issues manually, a time-consuming process that can extend from days to weeks for a single scene.
VOID, short for Video Object and Interaction Deletion, is intended to resolve these complications. Unlike conventional methods that merely fill in missing pixels, the system predicts physically consistent outcomes for the scene once the object is removed.
It leverages a combination of technologies to achieve this. Google’s Gemini analyzes the scene to identify areas that will be affected by the deletion, while Meta’s SAM2 segments the objects to be removed. These outputs are encoded into a quadmask, a four-value map indicating which areas to erase, which overlap, which are physically impacted, and which remain untouched. A video diffusion model built on Alibaba’s CogVideoX then reconstructs the scene in a physically plausible manner. An optional second pass applies optical flow to correct any distortions from the initial reconstruction.
Demonstrating Physically Consistent Object Removal In Video Production
Demonstrations of VOID show compelling results: balloons ascend naturally when a holder is removed, blocks maintain stability when unrelated blocks are deleted, and pool surfaces remain unaffected after a person is erased. In a human preference study with 25 participants, VOID was favored 64.8 percent of the time, outperforming Runway, a leading commercial alternative, which achieved just 18.4 percent.
This release marks Netflix Research’s first publicly available AI tool. Licensed under Apache 2.0, VOID can be used commercially and is hosted on Hugging Face. Hardware requirements currently limit access, with a 40GB VRAM GPU needed to run the model, but future optimizations and reduced infrastructure costs may broaden availability. VOID represents a shift in video production technology, moving from simple erasure tools toward systems capable of understanding and realistically reconstructing scenes, a development with significant implications for professional workflows.
The post Netflix Unveils VOID: Open-Source Framework For Physically Consistent Video Object Removal appeared first on Metaverse Post.
Source: Mpost.io
0 Comments