Researchers Create AI That Guesses How You Look Like, Just From Your Voice

Posted by Eyerys on June 2nd, 2019

When we humans listen to others speaking without seeing, we often build a mental model for the way how the person can possibly look.

Each person's voice is unique based on the fact that voice is a result of "the mechanics of speech production". This include: age, gender, the shape of the mouth, facial bone structure, and thin or full lips. These can all affect the sound we generate.

And in addition to the above, the way we sound is also affected by the language we speak, our accent, the speed we speak, and pronunciations of words.

Artificial Intelligence (AI) is a field that is heavily developed and learned.

After researchers from over the world create AIs for different purposes, researchers from Massachusetts Institute of Technology (MIT) pulled a huge feat forward, by creating an AI capable of predicting and reconstructing how a person looks like, based on his/her voice.

 
Speech2face
Speech2Face model and training pipeline

The AI is called the 'Speech2Face', and what the it does, is analyzing a short audio clip of a subject, to then reconstruct how the person might look like in real life.

While the AI is far from perfect, but it clearly shows how terrifying sophisticated AIs can be when learning even from a tiny snippet of data.

In a paper published on arXiv, the team at MIT describes said that:

"We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking."

"During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly."

"We evaluate and numerically quantify how--and in what manner--our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers."

And here, the results are astonishing.

Like it? Share it!


Eyerys

About the Author

Eyerys
Joined: May 7th, 2019
Articles Posted: 19

More by this author