Highlights –

  • Interactive deepfakes can mimic human engagement with realistic interactive behaviors utilizing developments in multimodal interaction.
  • Compositional deepfakes use fake content in bigger misinformation schemes that combine deepfakes with observed, anticipated, and manufactured world events to produce convincing counterfeit histories.

One must have come across the term deepfakes or may have at least seen a deepfake video posted online unless, of course, someone has not been so active on social media in the last few months. You may have seen a viral, entertaining video of Bill Hader offering a celebrity impression of Arnold Schwarzenegger. Or you may have come across a video of speeches Barack Obama never gave or videos of Mark Zuckerberg saying things he didn’t actually say – they have all been there.

Actually, deepfakes portray both the best and worst of what technology can do. It has succeeded in pushing the area of Artificial Intelligence (AI) and Machine Learning (ML) to previously unimaginable heights. But despite this being the case, it has also evolved into a tool of misinformation and propaganda, with all factors being considered.

The emergence of such videos has led to the dawn of a new era of digital threats in the form of deepfakes and other AI-based techniques that allow for the low-cost creation of fake videos where seeing is no longer believing. Such kinds of videos can be made easily and are not costly as well. These videos also go viral in seconds as the algorithms favor visual, dynamic content. Per se, the issue is not with deepfakes, but the tools required to create them are advancing every day and are also available readily.

And what we are seeing right now is just the tip of the iceberg. In the near future, we won’t be able to tell if the person we’re speaking to on a video call is real or a fake, and fraudsters won’t have any trouble producing a whole timeline of bogus films to support their claims or dupe others into thinking an offer or campaign is legitimate.

Talking about deepfakes, even Microsoft’s Chief Science Officer, Eric Horvitz, raised concerns recently that new and worse risks are on the horizon despite significant progress in identifying authentic recordings from deepfakes. He published a new research study identifying two distinct groups of deepfake dangers – interactive and compositional deepfakes.

Interactive deepfakes can mimic human engagement with realistic interactive behaviors utilizing developments in multimodal interaction. Compositional deepfakes use fake content in bigger misinformation schemes that combine deepfakes with observed, anticipated, and manufactured world events to produce convincing counterfeit histories.

Let’s study these in detail.

Interactive deepfakes

Interactive deepfakes are pretty accurate representations of people you may communicate with. Many people would find it difficult to distinguish whether the person on the other end was real or merely a simulation.

New interactive forms of deepfakes can be enabled via advances in generative AI methods in combination with research on multimodal interaction. Interactive deepfakes are prevalent because significant strides have been made in recognition, generation, and interaction techniques.

Interactive deepfakes push the envelope on the persuasiveness of impersonation, introducing additional types of “presence” and engagement via audio, visual, and audiovisual channels. Improvements in enabling technologies include work on speech detection, speech creation, and appropriate visual renderings of expressions linked with voice activity. All these allow source actors to have their voices re-rendered as those of generated target actors.

Advances in multimodal capabilities for recognition and generation will soon enable audio, graphical, and language technologies to be woven into toolkits. This will enable the interactive control of fictional renderings of targeted personalities in teleconferences via a controlling actor’s pose, expressions, and voice.

Moving ahead of manual control of multimodal impersonation technologies can help achieve scale. Various forms of automation can be utilized to enforce a system that can persuade viewers about the presence of an individual in audio calls and videos.

Some simple and gripping strategies to predict the existence of impersonating avatars in audiovisual conference calls include producing usual greetings and goodbyes of teleconferences. Advanced directions powered with automation can make us of elementary dialogue capabilities to warrant a rendered agent to answer suitable to particular triggers in the flow of a conversation.

Automated interactive deepfakes might be given a basic understanding of the flow of a conversation to inform decisions about if and when to interject. This would build on previous research in the multimodal interaction research community on models that predict when a speaker will yield the floor to other participants in a multiparty setting.

With background matching and cloning, a rendering might be momentarily inserted into a dialogue without the participants noticing the change, potentially for an essential intervention like nodding in agreement or casting a vote.

There are various ways to enhance interactive deepfakes, such as auxiliary audiovisual content, that can introduce simulated or real events occurring in the background that are accompanied by proper responses of the impersonating avatar, such as reactions to nearby explosions.

Compositional deepfakes

Compositional deepfakes are considerably more harmful since they can let threat actors fabricate complete fictional histories, giving them more credibility and making them more difficult to get denied than they otherwise may have been.

Compositional deepfakes can be created to provide fictitious narratives compelling in their capacity to connect and offer citizens and government officials convincing explanations of sets of occurrences in the world.

It is not hard to imagine how genuine narratives that accurately depict the actual causes, decisions, and motivations of events witnessed in the real world could lose out to the tailor-made synthetic history’s superior explanatory abilities.

Fundamentally, compositional deepfake plans can make use of insights about human psychology, particularly research on extensive sets of biases discovered and examined within the cognitive psychology area of judgment and decision-making.

The design palette for compositional deepfakes includes strategies to target particular people or groups, such as developing several storylines tailored to a distinct audience.

Beyond a goal of long-term persuasion, compositional deepfakes can be constructed for use in time-limited operations to achieve local goals regardless of whether the fabrications will eventually come to light.

Adversarial Generative Explanation

Tragic historical events show how professional propagandists may create and execute persuasive disinformation based on intuition and experience. Propagandists can use real-world observations, fabricated occurrences, and fictional media to build powerful narratives, create and ignite conspiracy ideas, and move communities to action—or acquiescence and inaction.

“Persuasion toolkits” may give synthetic histories and disinformation actions. Recent developments in machine learning and inference can power new engines of persuasion in the form of advisory tools or automated services that construct compelling narratives. Political campaigns, legal issues, and sales and marketing might use such systems. They could also be used to build convincing compositional deepfakes to produce and target narratives. These tools have been referred to as Adversarial Generative Explanation (AGE) systems.

Security against deepfakes

How can interactive and compositional deepfakes be defended? The approaches span from technology, and policy, to practice.

  • Media literacy should be promoted, and awareness should be raised about new types of manipulation and their ability to imitate, invent, and persuade.
  • Everyone needs to be sensitive to new forms of generating content and continue to seek ways to detect and counter influence and deception. Online meetings and video appearances may require multifactor identity validation.
  • Digital content provenance efforts need to include cryptographic “glass-to-glass” certification, which ensures that photons striking cameras are correctly reproduced on displays.
  • Potential watermarks should include encoded URLs or another coding that points to original content kept with privately stored keys. This attempt involves storing soft-hash fingerprints of content in a database with information about its generation.
  • National and international standards, regulations, and legislation with heavy penalties may help stop the flood of disinformation-focused synthetic media.
  • Offer sufficient training and generate awareness among employees. The training process should explain to them how malicious actors can leverage technology and how this can be detected. This way employees will be able to identify deepfake-based social engineering attempts.
  • It must be accepted Complete risk mitigation is impossible. But when false media is detected early, it can help minimize the impact on your organization.
  • Organizations must have an adequate response plan to deepfake threats. Individual roles and required actions must be defined here.

Although enhanced authentication measures can be helpful, raising the general public’s media literacy will have the most significant impact. Given the proper instruction, discerning and knowledgeable consumers are better able to handle these difficulties.

Horvitz has regularly discussed the risks that deepfakes provide, but new developments in this technology are raising the stakes significantly. As the world advances technologically, it is important to continue envisaging potential abuses of the technologies we produce and establish threat models, controls, and safeguards. There should be engagement across many sectors on emerging concerns, acceptable uses, best practices, mitigations, and legislation.