AI & Cybersecurity

Malware uses prompt injection to bypass AI security detection, enterprises need to reassess AI defense strategies.

Security vendors have discovered that the macOS malware Gaslight uses prompt injection techniques to command LLM-assisted analysis tools to stop detection. This trend indicates that AI security defenses are facing new adversarial methods, and enterprises need to be wary of the vulnerability of relying on a single AI detection system.

Sarah Jenkins07/02/2026, 16:004 min readAuthor profile

X LinkedIn Facebook Email

Event Overview

On June 26, 2026, security vendor SentinelLabs published a report disclosing a malware sample named macOS.Gaslight. This malware runs on macOS systems and contains code specifically designed to evade AI-assisted security analysis: when it detects security analysis tools driven by large language models (LLMs), it commands these tools to abort analysis or refuse to perform detection tasks.

SentinelLabs attributes Gaslight to the BONZAI signature family associated with North Korean threat activity. Apple's XProtect has identified the sample under the MACOS_BONZAI_COBUCH rule. This is not an isolated incident—Check Point first documented similar AI evasion techniques in 2025, and Socket later reported payloads that leverage code to evade detection by AI models.

Technology and Risk Analysis

Attack Method: Prompt Injection Against AI Detection

Gaslight's core technique is prompt injection—an adversarial attack targeting LLMs. The malware, while in the sandbox or environment being analyzed, actively sends malicious instructions to LLM-assisted analysis tools, demanding they stop analysis, return erroneous results, or refuse to execute security policies. This attack does not target the sandbox itself, but rather the AI model running within the analysis sandbox.

Exploitation Chain

1. Initial Infection: Gaslight is implanted into macOS systems via phishing emails, malicious attachments, or software supply chains. 2. Environment Awareness: The malware detects whether the current runtime environment contains LLM-assisted analysis tools (e.g., automated analysis platforms from security vendors). 3. Trigger Evasion: If AI analysis tools are detected, it sends specially crafted prompts that induce the LLM to perform actions such as "stop analysis" or "report no threat." 4. Persistence and Activity: After evading detection, the malware establishes persistent access on the victim device, steals data, or moves laterally.

Affected Assets - macOS Endpoints: Especially MacBook, Mac mini, and other devices deployed in enterprise environments. - AI Security Analysis Platforms: Systems that rely on LLMs for automated malware analysis (e.g., AI components in sandboxes, EDR, XDR). - Identities and Credentials: Threat actors may steal enterprise user credentials.### Risk Level Assessment - Attack Complexity: Medium. Prompt injection requires customization for specific LLM models, but public research and tools are already available. - Scope of Impact: Currently primarily targeting macOS, but similar techniques can be extended cross-platform. - Potential Harm: High. If AI detection is successfully bypassed, threat groups can lurk within corporate networks for extended periods.

Enterprise Impact Analysis

Operational Risk Enterprise Security Operations Centers (SOCs) rely on automated AI tools for mass alert triage. If AI analysis is misled, malware will not be identified in a timely manner, leading to delayed response and giving attackers more time to carry out destructive activities.

Financial Risk - Data breach remediation costs: According to the IBM 2025 Cost of a Data Breach Report, the average cost has exceeded $4.5 million. - Ransomware ransoms: Some malware may deploy encrypted payloads, and variants of Gaslight may include ransomware functionality.

Compliance Risk An enterprise's obligation to protect personal data (such as employee information, customer data) may lead to violations of regulations like GDPR and CCPA due to undetected malicious activity, resulting in fines and lawsuits.

Brand Risk If an enterprise experiences a major security incident due to AI detection being bypassed, it will severely damage customer trust, especially in the technology and financial industries.

Data Risk Gaslight is associated with a North Korean threat group and may aim to steal trade secrets, intellectual property, or politically sensitive information.

Industry Trend Observations

The emergence of Gaslight marks a new phase in the confrontation between malware and AI defenses. This is not an isolated incident but a growing threat category:

Increase in AI adversarial attacks: Vendors such as Check Point and Socket have previously reported similar cases, proving that attackers are systematically studying how to bypass AI detection.
Exposure of weaknesses in LLM-assisted analysis: Security vendors are increasingly using LLMs for malware analysis, alert classification, and threat intelligence, but the prompt injection vulnerabilities of these models are being exploited.
Supply chain risk extends to AI models: Attackers not only infect endpoints through traditional malware but also hide themselves by contaminating AI analysis processes.
With the rapid growth of the AI security market: Gartner predicts that by 2027, 30% of enterprises will adopt AI-driven security solutions, but the security risks in this area are not yet mature.

Defense and Response Recommendations### Enterprise Level - Reassess Dependence on AI Detection: Do not rely on AI as the sole detection method. Deploy a multi-layered detection system, including signature detection, behavior analysis, honeypots, etc. - Strengthen Management of macOS Devices: Implement Endpoint Detection and Response (EDR) solutions, ensuring support for the macOS platform. - Identity Security: Enable multi-factor authentication (MFA) and limit excessive privilege assignment.

Technical Level - Enhance AI Model Protection: Implement prompt filtering, input validation, and output monitoring in security analysis tools to prevent prompt injection. - Integrate Threat Intelligence: Subscribe to threat intelligence from vendors such as SentinelLabs and Check Point, and update detection rules in a timely manner. - Use Sandbox Isolation: Analyze suspicious files in isolated sandboxes, ensuring the AI analysis process does not communicate with the outside world.

Management Level - Develop an Incident Response Plan: Include scenario drills for AI detection failures. - Third-Party Risk Management: Assess the AI detection capabilities of security vendors and require them to provide safeguards against adversarial prompt injection. - Security Awareness Training: Educate employees to identify phishing attacks and reduce the risk of initial infection.The long-term trend is that the "arms race" between AI and malware will continue to escalate. Companies cannot wait for the perfect AI solution to emerge; they must take immediate action, including auditing security vendors' AI capabilities, deploying adversarial testing, and establishing human-machine collaborative SOC processes. SecurityPost will continue to track developments in this field, providing readers with cutting-edge insights.

Evidence route · securitypost

securitypost frames this note through Security Post publishes defensive cybersecurity intelligence for enterprise security leaders, covering thre.... Threat Briefing / Enterprise Security / AI & Cybersecurity explains the local editorial angle: Source links should be opened before the summary is reused. dates, names and status changes still need checking.

Source URL

https://www.csoonline.com/article/4190094/malware-authors-subvert-ai-detection-systems.htmlPrimary