Microsoft introduced the VALL-E neural network, which allows you to simulate a human voice after just three seconds of training. Its features do not end there: unlike alternative developments, VALL-E is also able to imitate the emotions and tone of the speaker, even when voicing text that the person did not speak.
The neural network was trained on 60,000 hours of English speech – at the moment its results are quite impressive (they can be assessed on GitHub), but sometimes the simulated voice still seems machine-made.
Although VALL-E is not a public domain development, journalists are already worried about the problem of such a tool falling into the wrong hands (especially if it continues to be improved). For example, thanks to this technology, attackers will be able to make realistic spam calls, imitating the voice of a person’s relatives and friends.
Source: Trash Box
Charles Grill is a tech-savvy writer with over 3 years of experience in the field. He writes on a variety of technology-related topics and has a strong focus on the latest advancements in the industry. He is connected with several online news websites and is currently contributing to a technology-focused platform.