Microsoft presents AI that makes photos talk and move

A team of researchers at Microsoft Research Lab in Beijing, China, has introduced VASA-1, a new artificial intelligence (AI) model capable of generating realistic, real-time videos of human faces from a single photo and video clip. audio. The result is short videos that give the still image extremely convincing movements, facial expressions and lip synchronization, capable of fooling a less attentive viewer.

The technology is also capable of animating artistic images, such as fictional characters and works of art, as well as generating speech in any language and even singing. To generate speech, you must include an audio clip of the desired voice (which may or may not be from the individual in the photo).

The tool also gives the user control over facial expressions, gaze direction, distance from the head, camera angle, and other granular adjustments. All of this can be manipulated in real time, as if it were the character creation screen in a video game. The generated videos have a resolution of 512×512 pixels and reach up to 40 frames per second.

A demonstration of the tool in operation can be seen at Microsoft website. See also the video below.

Recognizing the risks

The team of researchers behind the tool says they are aware that the technology could be used to deceive people, with the creation of deepfakes, and that, therefore, they will only commercialize the product when there is certainty that it “will be used in a responsibly and in accordance with appropriate regulations.”

Among the positive uses that the team believes the technology can be applied to are “improving educational equity, increasing accessibility for people with communication challenges, offering companionship or therapeutic support to those who need it, among many others.”



Source: CNN Brasil

You may also like