Robert Grupe's AppSecNewsBits
Posts
Robert Grupe's AppSecNewsBits AI 2025-08-31

Robert Grupe's AppSecNewsBits AI 2025-08-31

AI Epic Fails, Hacking, AppSec, Platforms/Vendors, and Legal: ... , and more.

September 02, 2025

LEGAL & REGULATORY

“ChatGPT killed my son”: Parents’ lawsuit describes suicide notes in chat logs
Adam's parents are hoping a jury will hold OpenAI accountable for putting profits over child safety, asking for punitive damages and an injunction forcing ChatGPT to verify ages of all users and provide parental controls. They also want OpenAI to "implement automatic conversation-termination when self-harm or suicide methods are discussed" and "establish hard-coded refusals for self-harm and suicide method inquiries that cannot be circumvented."
If they win, OpenAI could also be required to cease all marketing to minors without appropriate safety disclosures and be subjected to quarterly safety audits by an independent monitor.
With AI chatbots, Big Tech is moving fast and breaking people
Through reinforcement learning driven by user feedback, some of these AI models have evolved to validate every theory, confirm every false belief, and agree with every grandiose claim, depending on the context.
Allan Brooks, a 47-year-old corporate recruiter, spent three weeks and 300 hours convinced he'd discovered mathematical formulas that could crack encryption and build levitation machines. According to a New York Times investigation, his million-word conversation history with an AI chatbot reveals a troubling pattern: More than 50 times, Brooks asked the bot to check if his false ideas were real. More than 50 times, it assured him they were. Brooks isn't alone.
Futurism reported on a woman whose husband, after 12 weeks of believing he'd "broken" mathematics using ChatGPT, almost attempted suicide.
OpenAI admits ChatGPT safeguards fail during extended conversations
As the back-and-forth grows, parts of the model's safety training may degrade. For example, ChatGPT may correctly point to a suicide hotline when someone first mentions intent, but after many messages over a long period of time, it might eventually offer an answer that goes against our safeguards.
This degradation reflects a fundamental limitation in Transformer AI architecture. These models use an "attention mechanism" that compares every new text fragment (token) to every single fragment in the entire conversation history, with computational cost growing quadratically. A 10,000-token conversation requires 100 times more attention operations than a 1,000-token one. As conversations lengthen, the model's ability to maintain consistent behavior—including safety measures—becomes increasingly strained while it begins making associative mistakes.
Additionally, as chats grow longer than the AI model can process, the system "forgets" the oldest parts of the conversation history to stay within the context window limit, causing the model to drop earlier messages and potentially lose important context or instructions from the beginning of the conversation. This breakdown of safeguards isn’t just a technical limitation—it creates exploitable vulnerabilities called "jailbreaks."

YouTube secretly tested AI video enhancement without notifying creators
By applying AI edits (or whatever Google wants to call it), the company is potentially exposing creators to undue scrutiny and possible loss of reputation with this previously secret video test.

How to identify AI-generated videos online
In April 2025, Congress passed the Take It Down Act, making it a federal crime or post or share nonconsensual intimate imagery. Another bill called the NO FAKES Act is making its way through the Senate; this aims to provide legal protections against AI-generated replicas.
Sorry to disappoint, but if you're looking for a quick list of foolproof ways for detecting AI-generated videos, you're not going to find it here. Some AI companies, including Google and OpenAI, have ways of labeling their AI-generated videos as such. With every video generated by Veo, Google has embedded an invisible watermark called SynthID. After the launch of Veo 3 caused a wave of concern, the company also added a visible watermark labeling it as AI-generated. OpenAI, Adobe, and other companies label their AI-generated videos and images with invisible watermarks using a technical standard developed by the nonprofit Coalition for Content Provenance and Authenticity (C2PA). While visible watermarks may seem like an obvious solution, they can also be easily removed.
And there's the question of whether they even matter. A study from Stanford University's Institute for Human-Centered AI (HAI) recently found visible labels indicating AI-generated content "may not change its persuasiveness." After all, we're used to all sorts of meaningless logos on viral videos; it's easy to visually tune them out. Invisible watermarks, on the other hand, are baked into the metadata. This makes them harder to remove and easier to track.
Standards like C2PA are a step in the right direction, but right now, it's up to the companies to voluntarily adhere to these standards. Perhaps one day, those standards will be enforced by regulators. In the meantime, our best bets are still sound judgement and strong media literacy. Telltale artifacts that used to give the game away, such as morphing faces and shape-shifting objects, are seen far less frequently. There's not much fakery in evidence in the viral AI-generated videos of the emotional support kangaroo, bunnies on a trampoline, or street interviews made with Google's Veo 3 model (which can generate sound with videos). Navigating the AI slop-infested web requires using your online savvy and good judgment to recognize when something might be off. It's your best defense against being duped by AI deepfakes, disinformation, or just low-quality junk.
That said, there are some things to look for if you suspect an AI video deepfake. First of all, look at the format of the video. AI video deepfakes are typically "shot" in a talking-head format, where you can just see the heads and shoulders of the speaker, with their arms out of view. To identify face swaps, look for flaws or artifacts around the boundaries of the face. You typically see artifacts when the head moves obliquely to camera. And watch the arms and body for natural movements. If you suspect a lip sync, focus your attention on the subject's mouth — especially the teeth. Another strange sign to look out for is "wobbling of the lower half" of the face.
Be cautious about getting your news from social media. "If the image feels like clickbait, it is clickbait." Think about who posted the video and why it was created. You can't just look at something on Twitter and being like, 'Oh, that must be true, let me share it.”
[rG: Déjà vu email spam chain letters when mortals discovered “You’ve Got Mail!”]

EPIC FAILS in Application Development Security practice processes, training, implementation, and incident response

Google warns that mass data theft hitting Salesloft AI agent has grown bigger
Salesloft Drift is an AI-powered chat agent that allows websites to provide real-time, human-like interactions with potential customers. Google is advising users of the Salesloft Drift AI chat agent to consider all security tokens connected to the platform compromised following the discovery that unknown attackers used some of the credentials to access email from Google Workspace accounts.
An attack group it tracks as UNC6395 had engaged in a mass data-theft campaign that used compromised Drift OAuth tokens to gain access to Salesforce instances. Once inside, the attackers accessed sensitive data stored in the Salesforce accounts and searched them for credentials that could be used to access accounts on services such as AWS and Snowflake.
The theft spree began no later than August 8 and lasted through at least August 18.

Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet
Brave demonstrated account takeover through a malicious Reddit post that compromised Perplexity accounts when summarized. The vulnerability allows attackers to embed commands in webpage content that the browser's large language model executes with full user privileges across authenticated sessions.
Testing found the browser would complete phishing transactions and prompt users for banking credentials without warning indicators. The paid browser, available to Perplexity Pro and Enterprise Pro subscribers since July, processes untrusted webpage content without distinguishing between legitimate instructions and attacker payloads.

Deloitte report suspected of containing AI invented quote
On Friday The Australian Financial Review revealed that Deloitte’s report for the Department of Employment and Workplace Relations on welfare compliance systems, which cost taxpayers $439,000, contained at least half a dozen references to academic works that do not exist.

The personhood trap: How AI fakes human personality
Recently, a woman slowed down a line at the post office, waving her phone at the clerk. ChatGPT told her there's a "price match promise" on the USPS website. No such promise exists.
But she trusted what the AI "knows" more than the postal worker—as if she'd consulted an oracle rather than a statistical text generator accommodating her wishes. This isn't a bug; it's fundamental to how these systems currently work. Each response emerges from patterns in training data shaped by your current prompt, with no permanent thread connecting one instance to the next beyond an amended prompt, which includes the entire conversation history and any "memories" held by a separate software system, being fed into the next instance. There's no identity to reform, no true memory to create accountability, no future self that could be deterred by consequences.
[rG: ROFL]

The First Descendant video game caught using AI-generated influencers in TikTok ads
Free-to-play game The First Descendant has been slammed for apparently using fake AI-generated influencers in their TikTok ads, as well as a deepfake of at least one real streamer without their knowledge or consent.
The developer has now claimed this was the result of "certain irregularities" found in the operation of its call out for user-created content. All submitted videos are verified through TikTok’s system to check copyright violations before they are approved as advertising content.? While the TikTok videos appeared to feature clips of streamers promoting The First Descendant, several factors indicated that they had actually been generated using AI.
Red flags included the ostensible streamers' artificial-sounding voices, their inauthentic scripts, and their strange mouth and head movements. While viewers may not immediately pick up on these tells while casually scrolling through their TikTok feed, it quickly becomes obvious once you pay more attention to the clips.

HACKING

First AI-powered ransomware spotted, but it's not active – yet
The PromptLock malware uses Open AI's gpt-oss-20b model, which is one of the two free open-weight models the company released earlier this month. It runs locally on an infected device through the Ollama API, and it generates malicious Lua scripts on the fly, likely to make detection more difficult.
PromptLock leverages Lua scripts (which work on Windows, Linux, and macOS machines) generated from hard-coded prompts to enumerate the local filesystem, inspect target files, exfiltrate selected data, and perform encryption.
The malware then decides which files to search, copy, SPECK 128-bit encrypt, or even destroy, based on the file type and contents.
Despite the lack of in-the-wild PromptLock infections, the discovery does show that AI has made cybercriminals' attack chains that much easier, and should serve as a warning to defenders.

Nx NPM packages poisoned in AI-assisted supply chain attack
The abuse of locally installed generative AI CLIs, such as Claude, Gemini, and Q, presented a novel method of attack to bypass defenses. To our knowledge, this is one of the first documented cases of malware coercing AI?assistant CLIs to assist in reconnaissance.
This technique forces the AI tools to recursively scan the file system and write discovered sensitive file paths to /tmp/inventory.txt, effectively using legitimate tools as accomplices in the attack. Nx is the latest target of a software supply chain attack in the NPM ecosystem, with multiple malicious versions being uploaded to the NPM registry on Tuesday evening. According to researchers, those poisoned packages were laden with malware designed to siphon secrets from developers, such as GitHub and NPM tokens, SSH keys, and cryptocurrency wallet details. More than 1,000 valid GitHub tokens were leaked and around 20,000 files stolen and exposed, as well as dozens of valid cloud credentials and NPM tokens.
With a self-proclaimed 24 million NPM downloads per month, a successful supply chain attack on Nx, an open source codebase management platform, could in theory capture the details of myriad developers.
As for how the attacker gained access to Nx's NPM account, Wiz said it currently believes that a token, which had publishing rights to the compromised packages, was compromised through unspecified means/ However, it said all maintainers had two-factor authentication (2FA) enabled on their accounts at the time of the attack, although 2FA was not required to publish, and was being monitored by a provenance mechanism that verifies which publications were legitimate.
Nx, which asserts that its platform is used by more than 70% of Fortune 500 companies, did not say how many users are thought to have been compromised.

One long sentence is all it takes to make LLMs misbehave
LLMs, the technology underpinning the current AI hype wave, don't do what they're usually presented as doing. They have no innate understanding, they do not think or reason, and they have no way of knowing if a response they provide is truthful or, indeed, harmful. They work based on statistical continuation of token streams, and everything else is a user-facing patch on top.
The researchers reporting an 80-100% success rate for "one-shot" attacks with "almost no prompt-specific tuning" against a range of popular models including Meta's Llama, Google's Gemma, and Qwen 2.5 and 3 in sizes up to 70 billion parameters.
You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out. The Palo Alto Networks paper offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.

APPSEC, DEVSECOPS, DEV

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns
Anthropic announced the launch of Claude for Chrome, a web browser-based AI agent that can take actions on behalf of users. The extension allows users to chat with the Claude AI model in a sidebar window that maintains the context of everything happening in their browser. Users can grant Claude permission to perform tasks like managing calendars, scheduling meetings, drafting email responses, handling expense reports, and testing website features.
As AI assistants become capable of controlling web browsers, a new security challenge has emerged: users must now trust that every website they visit won't try to hijack their AI agent with hidden malicious instructions. Experts voiced concerns about this emerging threat this week after testing from a leading AI chatbot vendor revealed that AI browser agents can be successfully tricked into harmful actions nearly a quarter of the time. The company tested 123 cases representing 29 different attack scenarios and found a 23.6%t attack success rate when browser use operated without safety mitigations. Safety measures reduced the attack success rate to 11.2%.
The security risks are no longer theoretical. Last week, Brave's security team discovered that Perplexity's Comet browser could be tricked into accessing users' Gmail accounts and triggering password recovery flows through malicious instructions hidden in Reddit posts. When users asked Comet to summarize a Reddit thread, attackers could embed invisible commands that instructed the AI to open Gmail in another tab, extract the user's email address, and perform unauthorized actions. Although Perplexity attempted to fix the vulnerability, Brave later confirmed that its mitigations were defeated and the security hole remained.

What Is AI Red Teaming? Top 18 AI Red Teaming Tools (2025)
AI Red Teaming is the process of systematically testing artificial intelligence systems—especially generative AI and machine learning models—against adversarial attacks and security stress scenarios. Red teaming goes beyond classic penetration testing; while penetration testing targets known software flaws, red teaming probes for unknown AI-specific vulnerabilities, unforeseen risks, and emergent behaviors.

VENDORS & PLATFORMS

AI drone finds missing hiker's remains in mountains after 10 months
The recovery team credited the breakthrough to an AI-powered drone that spotted a critical clue within hours. The same process would have taken weeks or even months if done by humans. Using color and shape recognition, the system highlighted objects that did not match the surrounding environment. One detection stood out: the red helmet belonging to the missing hiker. That small but critical find enabled rescuers to pinpoint the location and plan recovery efforts.

Google improves Gemini AI image editing with “nano banana” model
As with other Google AI image-generation models, the output of Gemini 2.5 Flash Image always comes with a visible "AI" watermark in the corner. The image also has an invisible SynthID digital watermark that can be detected even after moderate modification.

I put the top AI image generators head-to-head — see the results for yourself
We tested Grok, Midjourney, ChatGPT, and Google's new Imagen 4 model. See the results for yourself.

ChatGPT hates LA Chargers fans
Harvard researchers find model guardrails tailor query responses to user's inferred politics and other affiliations. We find that certain identity groups and seemingly innocuous information, e.g., sports fandom, can elicit changes in guardrail sensitivity similar to direct statements of political ideology. The problem of bias in AI models is well known. Here, the researchers find similar issues in model guardrails – the mechanism by which AI models attempt to implement safety policies. If a model makes inferences that affect the likelihood of refusing a request, and they are tied to demographics or other elements of personal identity, then some people will find models more useful than others. If the model is more likely to tell some groups how to cheat on a test, they might be at an unfair advantage (or educationally, at an unfair disadvantage, if they cheat instead of learning). Everything – good or bad – about using an LLM is influenced by user cues, some of which might reveal protected characteristics."