Insikt Report – Deepfakes : I Have No Mouth, And I Must Do Crime
Posted: Monday, May 22

i 3 Table of Contents

Insikt Report – Deepfakes : I Have No Mouth, And I Must Do Crime
From KBI

Recorded Future warns about spread of AI-generated voice cloning technologies for banking fraud, disinformation, executive impersonation, family emergency scams and more

Intelligence company Recorded Future today released the findings of its latest deepfakes research. Titled “I Have No Mouth, And I Must Do Crime”, the report looks at how the spread of deepfake technologies, specifically through advancements in AI-based voice cloning, is pushing scams to a new level in particular when it comes to:

  • Banking fraud
  • Disinformation
  • Executive impersonation
  • Family emergency scams
  • Callback scams
  • AI music and copyright infringements

See further down below for more information about, and mitigation strategies for each of these use cases.

Deepfake voice cloning technology is an emerging risk to organisations, which represents an evolution in the convergence of artificial intelligence (AI) threats. When leveraged in conjunction with other AI technologies — such as deepfake video technology, text-based large language models (LLMs such as GPT), generative art, and others — the potential for impact increases.

Voice cloning technology is currently being abused by threat actors in the wild. It has been shown to be capable of defeating voice-based multi-factor authentication (MFA), enabling the spread of misinformation and disinformation, and increasing the effectiveness of social engineering.

One of the most popular voice cloning platforms on the market is ElevenLabs’s Prime Voice AI (elevenlabs[.]io), a browser-based text-to-speech (T2S; TTS) software that allows users to upload “custom” voice samples for a premium fee. While there are a number of voice cloning platforms referenced in this report (such as MetaVoice, Speechify, and so on), ElevenLabs is one of the most accessible, popular, and well-documented, and thus served as the case study for this research.

Recent Advancements In Voice Cloning Technology And Its Use In Scams

Voice cloning technologies, such as ElevenLabs, lower the barrier to entry for inexperienced English-speaking cybercriminals seeking to engage in low-risk impersonation schemes and provide opportunities for more sophisticated actors to undertake high-impact fraudulent schemes.

Currently, the most effective use of voice cloning technologies is in generating one-time samples that can be used in extortion scams, disinformation, or executive impersonation. Limitations to the use of voice cloning technologies, especially for enabling real-time, extended conversations and generating prompts in languages other than English, mean that extensive planning is required for fraudulent operations with a higher impact.

Threat actors have begun to monetise voice cloning services, including developing their own cloning tools that are available for purchase on Telegram, and the emergence of voice-cloning-as-a-service (VCaaS).

Public interest in AI, including voice cloning technology, has prompted an interest on dark web and special-access sources in AI platforms’ potential for abuse. Threat actors are also interested in leveraging multiple AI platforms in concert, thus enabling the convergence of AI threats. .

Use Cases And How To Mitigate Current And Future Threats

In order to mitigate current and future threats, organisations must address the risks associated with voice cloning while such technologies are in their infancy. As these technologies will only get better over time, an industry-wide approach is required immediately in order to preempt further threats from future advances in voice cloning technology.

Risk mitigation strategies need to be multidisciplinary, addressing the root causes of social engineering, phishing and vishing, disinformation, and more. Voice cloning technology is still leveraged by humans with specific intentions — it does not conduct attacks on its own. Therefore, adopting a framework that educates employees, users, and customers about the threats it poses will be more effective in the short-term than fighting abuse of the technology itself — which should be a long-term strategic goal.

Banking Fraud

  • Voice-based authentication is a common authentication method and security measure implemented by banks that have automated helplines. There are limitations to the use of voice cloning for the purpose of banking fraud in the sense that if a cybercriminal were to encounter a live human representative, use of voice cloning to initiate fraud would become much more difficult.
  • There are also pending legal hurdles to implementing voice-based authentication that may signal a future shift away from the technology and its use by financial institutions
  • Mitigation strategies include: implementation of real-time voice analysis software to detect anomalies in voice recordings, anti-spoofing technology such as “liveness” detection to prevent fraudulent actions from using pre-recorded or synthetic voices to impersonate customers, biometric authentication with multiple modalities, training employees on the risks associated with voice cloning  and how to identify suspicious activity related to voice cloning attacks, developing a rapid response plan to address incidents of voice cloning fraud including clear guidelines on how to respond to suspected fraud incidents and procedures for customer notification and remediation



  • Voice cloning technology can be used to spread disinformation by creating realistic audio recordings of public figures appearing to say thighs they never actually said. These can be used to create fake news reports, manipulate audio, or spread disinformation through social media platforms.
  • This technology can be particularly dangerous in the context of political campaigns or national emergencies
  • In addition to reputational damage to individuals, doctored audio clips can be used to inflict reputational damage to companies and institutions with potential financial impact.
  • Fake voice samples that contain negative sentiments by reputable individuals within a company can be used to jeopardise investors’ confidence, therefore decreasing the value of the company’s stock. This method can be weaponised by fraudulent actors to further already existing impersonation schemes and commit investment fraud by feeding investors fake information to plummet the price of a stick and then buying the shares themselves at the artificially low price
  • Mitigation strategies include: organisations should launch or fund public awareness campaigns to educate the general public about the risks and consequences of using voice cloning technology to manipulate public opinion, monitor and analyse voice recordings to detect signs of voice cloning technology such as unnatural pauses, inflections and other abnormalities, implement voice cloning detection and prevention measures such as machine-learning algorithms and AI to help detect suspicious activity in real time and take action before any damage is done.
    • The developers of voice cloning technologies must also enforce content moderation policies that prohibit the dissemination of false or misleading information, with penalties for those who violate these policies
    • Combating the abuse of voice cloning technology for disinformation requires a collaborative effort across governments, technology companies, civil society, organisations and the media. The cybersecurity industry must foster collaboration between these stakeholders by promoting information-sharing, supporting collaborative research efforts and convening multi-stakeholder and public forums.

Executive Impersonation

  • From 2019 to 2023 there have been multiple high-profile scams where scammers used voice-cloning technology to impersonate high level executives and defraud enterprises and banking institutions
  • Over the past year, we have continued to monitor a pair of Russian comedians and pranksters whose activity often aligns with the interests of the Russian state. They have leveraged their presence on social media platforms to promote their efforts to use phishing emails or ogre social engineering methods to target persons of interest who have spoken out about the Russian state. The aim of their efforts is to trick targets into participating in recorded phone calls or video chats in an attempt to embarrass them
  • This reporting highlights the concern and mistrust deepfakes can generate within communications at an international level
  • Mitigation strategies include: organisations must enforce executives’ use of MFA, which requires multiple forms of identification to access sensitive information or conduct high-risk transactions. They can develop unique voiceprints for their executives to make it harder for fraudulent actors to replicate their voices. Finally, they can educate their employees on the risks associated with voice cloning and how to identify suspicious activity related to executive impersonation, as well as monitor communications for signs of executive impersonation, such as unusual requests or behaviour.

Callback Scams

  • Also known as “Wangiri”, it’s a popular fraud technique used to target both individuals and enterprises. Scammers call victims and disconnect after 1 ring, not intending to connect the call but instead hope that the victim will return the call, in which case the victim is then manipulated into staying on the call as long as possible, often by being placed on hold. The return call is routed through an international number that consequently accumulates international calling fees, unbeknownst to the victim.
  • Voice cloning technology, like the freemium version of ElevenLabs, can be used to bolster these scam efforts since the premade voices offered on the platform are similar to automated voice assistants used by legitimate services on customer support phone lines and other customer-facing communications.
  • Mitigation strategies include: organisations should implement call authentication protocols, such as the “Secure Telephone Identity Revisited” (STIR) and “Signature-based Handling of Asserted information using toKENs” (SHAKEN) framework, to verify authenticity of incoming calls. They can also monitor call patterns on company devices and should use call blocking technology to prevent fraudulent calls from reaching employees and customers.

Family Emergency Scams

  • A family emergency scam is a type of fraud where the scammer poses as a family member or friend in need of urgent financial assistance due to an emergency. Scammers can also involve fake authority figures, such as a law enforcement officer, lawyer or doctor, to make the lure more convincing and scare the victim
  • In an escalation of this method, scammers have been observed using voice cloning to replicate the voices of victim’s loved ones.
  • In addition to the 9 premade vices offered by ElevenLabs, users with the Starter+ subscription model can upload 1-minute-long voice samples which can then be used in family emergency scams to make the fraudulent call more believable.
  • Mitigation strategies include: engage callers in conversation, since voice cloning cannot be used in real-time, and ask questions related to the caller’s personal information or other unique data points to verify their identity, limit the amount of sensitive information disclosed over the phone. Organisations can establish emergency contact verification protocols to confirm the identity of family members who are calling on behalf of employees or customers in emergency situations

AI music and copyright infringement

  • Voice cloning technology can enable copyright infringement in the creation of AU music by allowing an AI system to produce music that closely mimics the style and sound of an existing artist, without their permission or involvement. While this technology has the potential to revolutionise the music industry, it can also lead to copyright infringement if the generated music is too similar to existing songs.

  • As AI music continues to gain in popularity, it is important for organisations to be aware of the potential copyright implications and take steps to ensure that they are not infringing on the rights of existing articles. This can include using original compositions, obtaining permission from artists and record labels, and conducting regular audits of AI generated music to ensure that it is not too similar to existing works.

  • Mitigation strategies include: organisations should monitor online content to detect instances of copyrighted material being used without permission. Using AI, organisations can create unique voice signatures for their content creators to make it harder for fraudsters to replicate their voices, they can also use watermarking technology to embed identifying information into their audio recordings. Finally, organisations can educate the public on copyright laws and the risks associated with using voice cloning technology to infringe on copyrighted material.

The Production Team
The KBI Production Team is a staff of specialist technology professionals with a detailed understanding across much of cybersecurity and emerging technology. With many decades of collective industry experience, as well as expertise in marketing & communications, we bring news and analysis of the cybersecurity industry.
Share This