Voice assistants get samples of our voice that can be remixed and faked
The policeman said they needed $4,000 in bail – immediately.
The old woman hung up, but the phone rang again, and the policeman said she could speak to her grandson: he came on the line, pleading with his grandmother for bail money. She said she wouldn’t do it – not because she was hard-hearted, but because something didn’t “feel right”.
At that point, a man identifying himself as her grandson’s defence attorney came on the line, exclaiming, “I don’t need this case – I have 10 others!” But the grandmother remained adamant – no bail money – and so the call ended.
It didn’t take her long to contact her grandson and learn the whole thing had been a setup, a scam from masters of the craft. Yet the level of detail possessed by scammers – that felt weirdly new. They’d tracked down this woman, obtained her phone number, then somehow worked out both the name of her grandson, and that he lived in another town, some miles away.
Could they get all of that personal information from Facebook? Probably not. But it wouldn’t be terribly hard to find enough personal details on the social sharing site that it became a relatively straightforward process to trawl through other public databases, assembling a more-or-less complete family picture. Names, addresses, phone numbers: Everything you need to defraud an old lady in the middle of the night.
That it wasn’t quite good enough to pass a sniff test says less about the current state of the art than the capacity of the scammers. The last few months have seen a wealth of reports about “deepfake” videos – mapping famous celebrity faces into pornographic films. The technology behind these deepfakes – computer vision and machine learning algorithms – has been publicly available for long enough, and mastery of them has grown widespread enough, that the kind of forgeries that would have required painstaking, highly expert labour can now be handed to a piped-together set of command-line tools.
Adobe VoCo – deepfake for speech
What deepfake is to video, Adobe VoCo – its “Photoshop for audio” – does for speech. Fed a sufficiently long sample of any speaker – such as Barack Obama, who provides plenty of source material – and arbitrary speech can be endlessly generated. Obama can be made to say anything at all.
Imagine if those scammers had gotten a voice sample of that grandson: When his grandmother spoke to his vocal simulacrum, it would have responded in the right tones to make her believe – and pay.
All of which points to a big penny-drop moment concerning our “personal” data. With so many devices now under voice control – Google Home, Amazon Alexa and Apple HomePod all selling like hotcakes – capturing a sample of speech long enough that it can be weaponised and used against us has become easy. Sure, the big players will take all the right steps to ensure what’s said in the home stays in the home, but with speech as the new interface, the opportunities to record us at scale have already multiplied enormously.
We’re approaching a point where we will have to both guard our speech carefully and be very cautious before we believe anything anyone else says. We may soon see individuals with a special need to guard their security adopt a different vocal register when talking to voice assistants, something analogous to the register one might have used 100 years ago when communicating with staff “below stairs”.
That’s not a bad policy for any of us: now that our faces can be captured by depth-sensing smartphone cameras, and our voices recorded by pretty much every connected device with a microphone, we need to give a thought to how we can disguise ourselves. We can out-fake the fakers – or at least learn from the attempt.
From here on in, the question of “Is it real?” will linger over anything particularly outrageous – whether Russian kompromat of an American president, or something as prosaic as a bit of showy science. As programmer Benno Rice tweeted after Falcon Heavy blasted into orbit, “We’ve gotten really good at faking rocket launches, eh?” ®