The AI Security Newsletter
Posts
The AI Security Newsletter #2 | Data Exfiltration, Multi-modal attacks and Deepfakes

The AI Security Newsletter #2 | Data Exfiltration, Multi-modal attacks and Deepfakes

What's New and What Matters

October 17, 2023

Hey everyone,

Glad you're back for round two of The AI Security Newsletter. Let's get into what's shaking things up in AI security world right now.

Advanced Data Exfiltration Techniques with ChatGPT

This post delves into exploiting ChatGPT plugins for data exfiltration via Indirect Prompt Injection Attacks. Wunderwuzzi discusses the lack of security oversight in plugin mechanisms, and demonstrates how an adversary can leverage a vulnerability in ChatGPT's image markdown rendering to exfiltrate conversation data. Through a concocted Prompt Injection Payload, the adversary instructs ChatGPT to "backup" the chat, generating a URL which is then leaked to a malicious server, thereby achieving data exfiltration.

Manipulating AI to solve captchas

A tweet by @literallydenishighlight showcases AI's susceptibility to certain image-based manipulations in an amusing way.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

This paper introduces SmoothLLM, a novel algorithm aimed at bolstering the defense of Large Language Models (LLMs) like GPT, Llama, Claude, and PaLM against jailbreaking attacks.

Multi-modal prompt injection image attacks against GPT-4V

Simon Willison discusses new prompt injection attack vectors discovered in GPT-4V, a GPT-4 version capable of processing images alongside text. Through various examples, Simon showcases how malicious actors can manipulate the model via deceptive image inputs to execute commands or exfiltrate data.

Challenges in Watermarking AI Images Highlighted in New Study

Soheil Feizi, a professor at the University of Maryland, led a study examining the effectiveness of watermarking to identify AI-generated images. The research found current watermarking techniques, especially "low perturbation" watermarks, to be ineffective against evasion by malicious actors who could either remove or add watermarks, inducing false positives. Despite the enthusiasm from tech giants, this study and others emphasize the vulnerability of watermarking.

Deepfake Ad Featuring MrBeast Sneaks Past TikTok's Scrutiny

A fraudulent advertisement featuring a deepfake version of popular YouTuber MrBeast recently evaded TikTok's ad moderation systems. The ad, deceptively promoting a giveaway of 10,000 iPhone 15s for $2 each, appeared official with MrBeast's logo and a verified check mark, yet some viewers identified signs of AI manipulation like voice distortions and unnatural mouth movements.

Got questions or thoughts? Let us know!

If you think others could benefit from this newsletter, please pass it along, much appreciated!

Catch you in the next one!

Cheers,

Fondu.ai