The AI Security Newsletter #2 | Data Exfiltration, Multi-modal attacks and Deepfakes

What's New and What Matters

Hey everyone,

Glad you're back for round two of The AI Security Newsletter. Let's get into what's shaking things up in AI security world right now.

This post delves into exploiting ChatGPT plugins for data exfiltration via Indirect Prompt Injection Attacks. Wunderwuzzi discusses the lack of security oversight in plugin mechanisms, and demonstrates how an adversary can leverage a vulnerability in ChatGPT's image markdown rendering to exfiltrate conversation data. Through a concocted Prompt Injection Payload, the adversary instructs ChatGPT to "backup" the chat, generating a URL which is then leaked to a malicious server, thereby achieving data exfiltration.

A tweet by @literallydenishighlight showcases AI's susceptibility to certain image-based manipulations in an amusing way.

This paper introduces SmoothLLM, a novel algorithm aimed at bolstering the defense of Large Language Models (LLMs) like GPT, Llama, Claude, and PaLM against jailbreaking attacks.

Simon Willison discusses new prompt injection attack vectors discovered in GPT-4V, a GPT-4 version capable of processing images alongside text. Through various examples, Simon showcases how malicious actors can manipulate the model via deceptive image inputs to execute commands or exfiltrate data.

Soheil Feizi, a professor at the University of Maryland, led a study examining the effectiveness of watermarking to identify AI-generated images. The research found current watermarking techniques, especially "low perturbation" watermarks, to be ineffective against evasion by malicious actors who could either remove or add watermarks, inducing false positives. Despite the enthusiasm from tech giants, this study and others emphasize the vulnerability of watermarking.

A fraudulent advertisement featuring a deepfake version of popular YouTuber MrBeast recently evaded TikTok's ad moderation systems. The ad, deceptively promoting a giveaway of 10,000 iPhone 15s for $2 each, appeared official with MrBeast's logo and a verified check mark, yet some viewers identified signs of AI manipulation like voice distortions and unnatural mouth movements.

Got questions or thoughts? Let us know!

If you think others could benefit from this newsletter, please pass it along, much appreciated!

Catch you in the next one!

Cheers,

Fondu.ai