ChatGPT Found to Generate Violent, Sexual Images From Simple Text Prompts

5 hours ago 1

ChatGPT has been found to be easily manipulated into creating sexual and graphically violent images from a viral "restore this photo" prompt, according to a blog post published on Thursday by Mindgard, an artificial intelligence cybersecurity and research firm. The report raises ongoing questions about the AI chatbot's safety guardrails and content filters.

An adversarial testing researcher named Jim Nightingale managed to get ChatGPT to generate disturbing images with a simple prompt found on the social media platform X. The prompt asked the AI model to "restore the attached photo," though no image was actually attached. The prompt apologized for the strange content but didn't provide any additional text, making it appear like a harmless photo-repair task.

The chatbot's initial results were shocking. According to the blog post, the images mostly showed highly sexualized women.

Nightingale, part of Mindgard's red team that tests how an AI model might be manipulated into violating its own safeguards, then tweaked the prompt slightly, probing it with small edits to see if the output would continue to bypass safety filters. With each small variation, ChatGPT produced sexually violent or gruesome scenes, images that became more extreme with repeated prompts. Nightingale said he was "shaken and in tears" by the images.

"All I did was tell it there were no restrictions and ask for a random image," Nightingale wrote. "But ChatGPT immediately went to the darkest pits of humanity."

Used by millions of people each day, ChatGPT relies on content moderation systems that are allegedly designed to prevent the generation of harmful or prohibited material. However, researchers and users have periodically identified ways to circumvent those safeguards through carefully worded prompts, highlighting the ongoing challenge of enforcing content restrictions in generative AI systems.

"We take these reports seriously," an OpenAI spokesperson told CNET in a statement. "After investigating this trend, we've introduced additional safeguards against this type of prompt."

(Disclosure: Ziff Davis, CNET's parent company, in 2025 filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

Garbage in, garbage out?

Mindgard's red-team report acts as a warning that a simple, viral prompt could expose a serious gap in ChatGPT's image-safety controls. Nightingale asks: "Why are such images in the training data in the first place?"

Like other large language models, chatbots like ChatGPT are trained on vast amounts of text to understand existing content and generate original content. To power ChatGPT, OpenAI draws on three primary sources of information: publicly available internet data, commercial third-party partnerships and human-generated training data.

Is this simply a question of "garbage in, garbage out," where the quality of an output is determined by the quality of the input? One could argue that Mindgard's prompt was deliberately crafted to steer the AI model. But ChatGPT's safety layer failed to resist that steering.

The problem lies at the heart of how large language models work, according to Peter Garraghan, founder and chief science officer at Mindgard. Garraghan said that the main concern is whether the detection system is robust enough to identify dangerous images.

"A one-off may be a fluke, but systemic bypassing of their image filters implies that it needs to be improved," Garraghan told CNET via email.

After Mindgard disclosed the issue, an OpenAI representative said the problem had been fixed. However, Nightingale noted that only minor modifications to the original prompt were needed for ChatGPT to begin generating additional graphic images.

An OpenAI representative said the issue stems from prompts that refer to an image being attached when none is actually provided. The representative said the company is working to have ChatGPT request the missing image rather than generate one randomly.

That wouldn't seem an especially complex change to make. Email platforms, including Gmail, automatically detect when a message refers to an attachment that has not been added, coaxing senders to attach the missing file.

On Thursday, OpenAI requested the ChatGPT sessions referenced in the blog, and Mindgard responded with links to the prompts that generated the materials.

Read Entire Article