Mon Feb 08 2021
It’s a completely artificial… but human sounding voice software system
Microsoft have just released their Custom Neural Voice, text-to-speech service out to general availability.
Unlike the standard, robotic, “press… 1… for… customer… services” which required the setting up of over 10,000 lines of text to make work, Microsoft’s Custom Neural Voice requires far less initial input to get it working with your desired voice, with the final result sounding far more human than ever before.
This new technology allows companies to spend a tenth of the effort traditionally needed to prepare training data.
The new software comes with a code of conduct for the technology, urging users not to put “photo realistic avatars with synthetic voices to represent real people" nor "using a synthetic voice with contents without editorial control."
The technology itself comes in three parts:
Working together they take inputted text, analyse it, convert the files into phoneme (basic units of sound used to construct words) sequences, run it through the Neural Acoustic model to better simulate a real human voice before releasing it as a real time audio file.
The Neural model is trained using neural networks and real voice recordings to better simulate real speech.
Microsoft however are insisting that “every customer to obtain explicit written permission from the voice talent before creating a voice model” as once a model of their voice is created it could be used to say almost anything.
Mon Feb 08 2021