Beyond ‘Vibe Coding’: Securing AI-Generated Software Requires a Risk-First Strategy
Organisations face significant security risks in the era of "vibe coding" and AI-generated software. A recent benchmark test conducted by Armis that revealed that every model struggles to consistently produce secure code. This reality necessitates that Application Security (AppSec) programs shift away from fragmented vulnerability scanner management and adopt a risk-first strategy.
Posted: Thursday, May 07

i 3 Table of Contents

Beyond ‘Vibe Coding’: Securing AI-Generated Software Requires a Risk-First Strategy

Introduction

The era of “vibe coding” is officially here, and Asia Pacific is central to this trend, already hosting a rising number of vibe coding start-ups. AI coding has accelerated software delivery to a breakneck pace, but this velocity has come with a steep security tax.

While the industry is enamoured with the speed of delivery, those of us in the trenches of Application Security (AppSec) are seeing a different reality. A recent benchmark test of 18 leading generative AI models reveals that every single model struggles to generate secure code consistently.

As leading LLM providers like Anthropic and OpenAI begin to move into the AppSec space with AI-driven SAST capabilities, the industry is at a crossroads. We are seeing a shift from rule-based pattern matching to models that “read and reason” like human researchers. However, the critical takeaway for any modern security program is that cybersecurity must be about risk management, not scanner management.

The Reality of AI-Generated Vulnerabilities

Rapid enterprise adoption of AI-native development is outpacing critical security safeguards, leaving organisations exposed to systemic vulnerabilities.

Recent research conducted by Armis Labs highlights a pervasive security gap in AI-native development. Even the most capable models currently produce vulnerable code in over 30% of atomic use-case scenarios. The benchmark findings point to several “universal blind spots” where 100% of tested models failed to generate secure code, particularly in high-risk areas like memory buffer overflows, design file uploads, and authentication systems.

Key insights from the benchmark include:

  • The Model Performance Gap – There is a dramatic variance in security posture across model families. For instance, Gemini 3.1 Pro emerged as a leader with the lowest rate of OWASP Top 10 or Armis Early Warning CWEs-related vulnerabilities (38.71%), while older proprietary models like Claude Sonnet 4.5 and Claude Haiku 4.5 showed significantly higher vulnerability counts and a lack of baseline security guardrails.
  • Common Technical Pitfalls – AI models rarely implement resource limits or throttling by default. CWE-770 (Allocation of Resources Without Limits) was the most frequent vulnerability found across all models.
  • Cost vs. Security – Low-cost open-source models (such as Qwen 3.5 and Minimax M2.5) provide highly competitive security performance at a fraction of the price, suggesting that robust code safety is accessible regardless of budget.

The Failure Mode of Tool Sprawl

While a single vulnerability scanner was adequate for security teams two decades ago, today’s average enterprise is overwhelmed by a multitude of fragmented scanners and feeds across cloud, containers, identity, and code. This “tool sprawl” results in fragmented signals, lack of ownership, and subjective prioritisation.

Even a best-in-class scanner is only one part of an effective program. The real win isn’t just finding a bug; it’s building a system that turns those findings into risk-reduced outcomes that are meaningful to the business.

Shift to True Risk Management

Organisations leveraging AI for code generation should prioritise newer, next generation coding models for production-bound software, but they must recognise that no model is currently sufficient for autonomous development. To mitigate the inherent security debt created by AI, teams should:

  1. Implement AI-Native AppSec Controls – Traditional pattern-matching tools often lack the depth to catch complex logic flaws in AI-generated code. AI-native scanning and quality gating are necessary to prevent insecure code from reaching production.
  2. Shift to Contextual Risk Management – Focus on prioritising findings based on production reachability and business impact to eliminate tool sprawl and alert fatigue with all your security findings across all scanners.
  3. Validate Remediations – Adopt frameworks that use multi-stage agentic loops to independently verify that fixes actually reduce risk without introducing new flaws.

Closing the Loop on Risk

While AI-native scanners can find more vulnerabilities, the most successful security programs will understand that an AI scanner is a tool, not a complete strategy. Organisations that are embracing AI for code generation must immediately implement AI-native application security controls to reduce risk.

Zak Menegazzi
Zak is an accomplished leader with extensive experience in senior sales leadership roles across various cybersecurity and technology firms. In his current role as Cybersecurity Specialist, ANZ at Armis, the cyber exposure management & security company, Zak serves as a trusted advisor to ANZ and APJ enterprises. He focuses on providing guidance to Armis customers in the region to drive greater adoption of cybersecurity best practices. Prior to Armis, Zak held territory, channel sales and management roles.
Share This