KB [00:00:10]:
What’s up everyone? It’s KB and I’m on the go at Cisco Live at Mandalay Bay Resort and Casino here in Las Vegas. This week, some of the biggest conversations shaping networking AI in cybersecurity are all happening at once. This week is where operators, engineers and executives get into the details, what’s working, what’s not, and what’s actually changing inside enterprise environments at scale. We’re hearing a lot about AI driven networking, automation and visibility, but the real question is how does that translate into security outcomes, operational resilience and ultimately cost? Because behind every intelligent network claim, there’s still a human team trying to make sense of complexity. Across the next few segments, we’ll be speaking with leaders on the ground here in Vegas, unpacking what’s real, what’s noise, and where the industry is genuinely headed next. This is KB on the go from Cisco Live, Las Vegas. Let’s get into it. Joining me now in person is Peter Bailey, SVP and General Manager at Cisco Security.
KB [00:01:19]:
And today we’re discussing the agentic workforce is now here, but who secures it. So, Peter, thanks for joining me and welcome.
Peter Bailey [00:01:26]:
Thank you. Thank you, Karissa. Thanks for having me.
KB [00:01:28]:
Okay, so Peter, as you know, many businesses are now racing to deploy agentic AI. But would you say organizations are in turn underestimating how dramatically this changes the security model?
Peter Bailey [00:01:41]:
I think that was probably true 6 or 12 months ago. I think people have come to realize that there are pretty dramatic security concerns associated with adopting AI. When you think about broadly capable agents like OpenClaw as an example, that can really run amok, right? As opposed to maybe an agent that’s much more tightly bounded in its role, they present vastly different security challenges. And so I think the industry has now begun to recognize that. And we see that because we do a big study each year on AI adoption. And in 2024, you know, this was kind of coming off of the OpenAI channel ChatGPT revolution. In the prior year, everyone was adopting AI in some capacity and allowing their employees to use this. And then within about 12 months, the CISO got involved and clamped everything down because they began seeing just all this shadow usage, DLP issues, data leading the organization.
Peter Bailey [00:02:33]:
Everything went the other direction. And now I think we’re kind of in a mode of as these innovations are happening and these agents are now being introduced, providing more of a careful step back into adopting these technologies, but obviously with a very broad spectrum of adoption and types of agents and all
KB [00:02:47]:
of that, because what’s interesting to me, and I think Chuck raised it, Chuck Robbins raised it earlier today. If you don’t do it, there’s a risk, but then you have to sort of do it because it’s just the way the world’s moving. Right. So if you look that from a customer perspective, is there still that inertia on we don’t do it, we’re going to fall behind, but we have to do it even if we don’t have all the answers.
Peter Bailey [00:03:08]:
Yeah, I mean, I think it’s as simple as the CEOs and the boards of these companies associate how they’re going to be able to compete in the future and differentiate their cost structures in terms of the adoption of AI. And meanwhile the CISO is going, hey, wait a minute, like we’re going to like completely hurt our organization potentially. And so that tension is playing out. Right. And so I don’t think the security community has yet given enough of the solutions required to sort of say let’s just, you know, let it rip and let’s go. And I think that’s the moment we’re in right now.
KB [00:03:38]:
Say, to add to that, like I’m hearing from people like yourself that I’m interviewing that like we’ve over invested or extended ourselves and now it’s like any new technology, it’s like the pendulum swings one way and the other and then it sort of gets in the middle. Or do you think just due to the velocity, like it, it may not come back to the middle where it sort of has that equilibrium?
Peter Bailey [00:03:56]:
Yeah. I am not a good fortune teller. I do not pretend to be one. You know, the things I look at. So kind of the ultimate question underneath your question is like, is this kind of like, you know, we’re in the boom cycle and there’s going to be bust and it’s going to balance out, Right. Is this all hype? You know, are we in the hype cycle? We’re for sure in elements of a hype cycle in terms of the number of new companies starting up every day that are trying to bring AI to, you know, what was a solution that might have been SaaS before or something else. But it’s hard to ignore the revenues of Anthropic and OpenAI and the growth they’re seeing primarily driven by some of these coding use cases or consumer use cases in the case of OpenAI. And so unlike maybe past big hype cycles, the revenue at the moment is very, very, very strong on these companies.
Peter Bailey [00:04:44]:
Now, is there going to be an event in the future that maybe pulls things back or do Companies wake up tomorrow and say my token budget just blew up 10x and I can’t pay for this anymore. I think some of those concerns are emerging. My own concern is that security continues to be a friction point for adoption and that creates plateau in the market. But we’re obviously all working furiously to help solve the security issues that are associated with AI.
KB [00:05:09]:
So on that note, you say security continues to be a bit of a friction point. Any you can sort of share what are some of the concerns that people have that are sharing with you?
Peter Bailey [00:05:17]:
Yeah, well, I think it’s more just what we talked about already that. So hey, I don’t want to allow AI agents broadly in my organization because I don’t have visibility. I don’t see what it’s talking to. I can’t control what it’s connecting to. Maybe it’s doing things I don’t want it to, you know. And so it’s like all those questions that you would ask, you’d also ask of a human and insider risk type questions. Now imagine that at machine speed and scale and the proficiency is super high. You know, all of those questions I think are the ones that need to get answered.
Peter Bailey [00:05:46]:
And so most of CISOs or all the CISOs we talk to, they’re all very worried about that.
KB [00:05:52]:
Fair. Okay. So moving on. You said security now needs to govern actions, not just inputs and outputs. So what does that actually mean in practice?
Peter Bailey [00:06:03]:
Yes, so we like all things, you try to break it down in its components. And so when we think about agentic security, there’s sort of this idea that you want to be able to discover, you want to be able to see something’s posture, you want to be able to authenticate it or not. You then want to be able to authorize it to do something, bound its actions, so to speak. Hopefully that’s tied to some sense of intent or what have you. And then you’re going to watch its behavior and have the ability then hopefully to revoke privilege, revoke access, revoke its capability. And all of those things require visibility, controls, policy and enforcement. And all that has to happen in real time. And right now all the controls we built up the last 20 years are kind of in human time.
Peter Bailey [00:06:47]:
Right. It’s like I’m going to write a policy, I’m going to push the policy. Right. Well now how do we do that dynamically based upon what we think the thing should do and what it should connect to versus what it’s actually doing. And then it spawns a subagent to do something Else and a sub process to do something else. And so how do we actually get visibility on that and put controls around that? And so I think we’ve got to have very, very good answers at every step in that chain. And if you think about a heterogeneous environment that’s got a lot of old infrastructure, new infrastructure, you know, what have you, it’s a very, very complicated picture. And so that’s why I come back to.
Peter Bailey [00:07:19]:
I think if there’s a spectrum of agents, of highly bounded agents that just can do this one thing and nothing else, that’s probably going to be okay. Right? But even an agent that has access to your email and your calendar could do some things that maybe you don’t want. Right? Send an email to everybody in the organization. Right. You very quickly can get into, okay, how do we constrain these things? And that’s policy, that’s rules, but it’s also the governance, the behavior monitoring and the ability to do enforcement. So you can imagine, like, you can get some pretty scary scenarios pretty quickly.
KB [00:07:50]:
And that’s interesting because I worked in a bank, like in security, and it’s like so many legacy systems.
Peter Bailey [00:07:56]:
Oh yeah. Mainframes and.
KB [00:07:57]:
Yeah, so that’s, that’s when I was first exposed to Cisco, actually. And just thinking about that, on how they’re going to approach this, like a big legacy beast like that. So what do you sort of think then, moving forward? Is it going to be like bit by bit, bit more iterative on how people are approaching this problem?
Peter Bailey [00:08:14]:
It’s hard to say. I think I’ll tell you what we’re seeing is we’re seeing people using AI in robust ways in very confined areas. Right. So almost think like a sandboxed area. Right. And not allowing it to get out of that. Or they’re thinking about maybe more harmless, simple use cases like customer service or, you know, different processes, what have you. And so I think we’re kind of stepping our way into it.
Peter Bailey [00:08:39]:
But again, sort of this highly capable agent able to do lots of things, that’s your personal assistant that runs on the same network, that has access to all the same resources you have, I still think we’re a bit of ways away from that now. If you were to kind of clean room, your environment and every asset you brought in had an identity and a label and metadata and you understood really to a great degree of kind of what everything on the network was. And I think identity is going to play a very big role in this. In the future you might start getting much more Confident about your ability to see what’s happening, what it’s talking to, should it, should it not? And being able to start really driving policy to protect against that. But I think the piece that we still don’t have an answer for in that context is also kind of the machine speed of these things where policy might have to be dynamic and on the fly. And do we trust an LLM to govern another LLM or does there have to be a deterministic step in there someplace? And if you put a human in the loop, well, now we’ve slowed things back down, right. So I think we just have to mature a lot of different steps along the way there and solve some problems there. And there’s probably also maybe more static governance to think about.
Peter Bailey [00:09:46]:
So for example, like maybe a static governance should be, hey, if X anything connects to this thing, I don’t know what it is, let’s quarantine it for 24 hours. Because that’s just what we got to do. Because those were the crown jewels it was trying to talk to. And so that’s going to be our policy and we got to be okay with that because we don’t have a means to do it dynamically. Right. And so there’s like almost like business constitutional type governance decisions maybe organizations could make that were kind of more broad, that could protect you. There’s talk about that, but ultimately if you want to get to a truly dynamic environment, you’re basically trusting LLMs to govern, LLMs to write policy, and you’re really not going to be in the loop. You’re going to get an audit report that may say here’s what happened and that was bad or here’s what happened and things went fine.
KB [00:10:28]:
Because it almost sort of brings us back to the original conundrum of like AI. So it’s like, are we going to use agents to govern the agents or if we put a human in a loop, it slows down. Where do you sort of sit on it? I don’t know. You talked about the deterministic step, but yeah.
Peter Bailey [00:10:42]:
So again, like I try to simplify things because it can get really complicated really quickly. For AI to govern agents, you’re going to have to be able to have just such a high level of accuracy of them kind of getting it right and the visibility and auditability of that. I think it’s kind of like self driving cars. It took us 20 years to kind of get where we are today. Right. Just because that last couple percent was probably 10 years of the 20 years. Right. Of kind of getting.
Peter Bailey [00:11:09]:
And so it feels like we’re maybe on a curve like that a little bit here too, to, again, let it rip. Agents watching agents, all that stuff. So I think the human in the loop, I think the human creating guardrails, I think the human, you know, creating kind of meta policies. I think, you know, I think there’s going to be sort of stepping stones along the way here. But, you know, listen, I’m also optimistic about innovation and our ability to solve these problems with new innovation. And this is a problem that everybody’s working. I mean, we’re really going to spend, I think, globally, like a trillion and a half dollars this year on putting AI infrastructure in the ground or building out GPU clusters and whatever. And so people are going to want to return on that investment.
Peter Bailey [00:11:47]:
There’s so much pressure on solving these problems. My bet is it’ll probably go faster because we have to go faster. And I’m optimistic that we’ll come up with some solutions here.
KB [00:11:57]:
I think Jeetu Patel touched on that in the press conference earlier today. Like the phases of it, there’s going to be an investment and then people will learn a bit better. Then when they get a bit better, then you’ll start to see the ROI conversation coming to it down the line. It’s still going to be a bit of a process. It’s going to take time is what I’m hearing.
Peter Bailey [00:12:14]:
I think so. I think no one knows how long because we’re an innovation away from things going really fast. And again, we got a lot of smart people thinking about this, working it, because, you know, my own personal view is sort of the where we are today versus 10 or 100x adoption is going to be security, probably. Then there’s probably a question about data center power compute. We’ve all talked about that, but right now I think it’s security. And I think that really came to a head because Mythos showed us what really advanced capabilities models could do. So did Openclaw. Right.
Peter Bailey [00:12:43]:
And that. And we’re just literally five, six months into that whole iteration and. And mythos just 60 days. And so I think now we’re realizing that this has to be kind of the P0 thing we’re putting our attention to, because if we don’t put our attention to it, we’re not going to be able to continue to run at the speed we want to run.
KB [00:13:00]:
So I was going to switch gears now for a moment to talk about the MCP or the model context protocol. Yeah, so that’s sort of emerging now as a standard for agent interactions.
Peter Bailey [00:13:09]:
Sure.
KB [00:13:10]:
So you compared it to what HTTP became for the web, like back in the day. So is this effectively the birth of an entirely new security layer or what are we looking at here?
Peter Bailey [00:13:21]:
Ah, well, you raise a really important point because it’s not a security layer. Right. It is a means for how an agent can interact with resources effectively. Right. And it does not have a security layer, it does not have authentication built into it or any of those things. And so MCP was a means to sort of create an easy way to communicate, you know, agents to things. Right. Or to resources.
Peter Bailey [00:13:43]:
But now we’re having to build gateways and proxies and these things that actually start looking at what is the request that’s going in and trying to again, authenticate these agents, if you will, to create that security layer. And so I kind of view my own view as kind of, it’s almost more like TCP ip which allowed us to connect to things, but had no security meaning in it. Right. And so MCP as a protocol is very, very valuable and it’s simple and people can adopt it very easily. Building an MCP server is super easy. Right. And so that’s great, but the security controls around that are things we’re having to layer in and that’s obviously not part of the spec.
KB [00:14:17]:
So then on that note, would you say, because you have to layer it in, is it going to create more complexity?
Peter Bailey [00:14:21]:
Sure. By definition, yeah. I mean, anytime you’re doing security as an afterthought, it’s more complexity. It’s friction. Right. And so when we think about Cisco talks a lot about fusing security into the network, what we’re really saying is we think it should be secure by design or security should be a first order principle in how we design networks and in these systems. And not like, hey, we’re going to connect these two high speed networks, but we’re first going to route it down here for deep packet inspection and add a bunch of latency and then we’ll zoom it back up. Right.
Peter Bailey [00:14:49]:
So that doesn’t sound like the right thing to do. And as data volumes go 10x or 100x, it’s only going to get worse. And so we’ve got to really think about it as a first order principle where security is kind of built into how we think about these networks and applications.
KB [00:15:02]:
Well, what was kind of my mind is that over the last 10 plus years I’ve been in security myself. It’s been drummed into security by design.
Peter Bailey [00:15:09]:
Yes.
KB [00:15:09]:
So now I feel like we’ve done the opposite of what we can say we should be doing.
Peter Bailey [00:15:13]:
Yes.
KB [00:15:14]:
So how does that then look like? How do we get to that point, would you say? Or is it because we’ve just moved so fast, how maybe the time to.
Peter Bailey [00:15:22]:
Yeah, this feels like more one of those human nature things, right, Where I think we imagine perfection and then the real stuff happens. And so, you know, I think security for years was a bit of a whack a mole, like new attack servers. Let’s deliver a solution for it, right? And then you get one more solution, one more solution and you start building out this just patchwork quilt of things that are trying to solve problems. They don’t talk to each other. There’s no data plan in the background. We roll it up to a SIM to try to make sense of it. Right. And so, and so I think it’s more now trying to think about how do I get signal everywhere, how do I put enforcement everywhere, how do I unify data planes? So actually I can analyze it and think about.
Peter Bailey [00:15:59]:
Also, ultimately, if we have like a broad picture of an enterprise, its assets, identities and all that, you can start thinking about like things like knowledge graphs. And knowledge graphs are great for also training AI and training things that are gonna allow you to start doing things like automating policy. And so I think we just have to be thinking about those kinds of capabilities broadly, as opposed to more stovepipe control for specific situations. And we’ve been talking about platforms the last four or five years. You know, every security vendor kind of has a platform pitch, and initially that was about ease of use and, you know, and kind of better integration. But I think what we’re getting to is that these things need to be coming together really from a data plan and policy perspective, because that’s going to then enable us to start doing more automation. And automation, as we talked about, is going to be needed to really secure AI actions because of the machine speed nature of it.
KB [00:16:51]:
So then if enterprises can’t see what agents are sort of accessing, how dangerous does that become?
Peter Bailey [00:16:57]:
Well, yeah, I mean, we talked about this earlier, so I think that’s a thing everyone’s locking down. So in the early days of using these tools, tools, you know, people would be using, you know, chat GPT on their laptop or they’re using Claude code, and all of a sudden they’re uploading, you know, personal information or company information or what have you. And so that’s not good. Like, you know, so everyone’s been locking that stuff down and so you know, there’s a big push right now around DLP and dspm, which is really all about I want to be able to label all the data in my organization, understand what it is. So I want to understand if I should let something else connect to it. Right. And so these are again problems around a long time. But I would say like the average hygiene of an organization is quite low with maybe the exception of like their crown jewel assets.
Peter Bailey [00:17:42]:
They probably do a great job with that and everything else. Less good of a job. So data labeling is another one of these things we’re going to have to do to create this environment where I’m going to feel okay for things, connecting to things.
KB [00:17:53]:
So what’s just quickly on that note, I interviewed the president of NetApp talking about how CEOs are coming to NetApp and saying like, hey, like how do we monetize all our data? Because we got to get ahead AI point of view. But then they’re like, well, we don’t know where to start. So do you think given your pedigree in the space, people are feeling a little bit overwhelmed, pulled from pillar to post on where to start, where to go, how to stay ahead, but also stay secure?
Peter Bailey [00:18:16]:
Yeah, I think, you know, and you know this from your background, security tends to be what is the thing on fire? And that’s where all the intention goes to. And so I think we have a stack list. So maybe another way to say this is that the threat model’s changed. Okay, we can agree that the threat model has definitely changed in this sort of new set of risks and that’s creating a new stack rake. And some of those things in the stack rank we don’t have good answers for today. Things like exposure, containment, AI gets in your environment, you got a bunch of old applications that are highly vulnerable, can get attacked very easily, or I got OT that doesn’t have anything protecting it. Right. So that’s a first order priority that we kind of didn’t have to solve before because human attackers just didn’t have this capability.
Peter Bailey [00:18:56]:
Right. But now we have these capabilities. And so that just boom, went up the list of things we have to go solve. And so what I’m hearing more is okay, we gotta solve mythos vulnerabilities, we have to solve internal vulnerabilities, we have to start thinking about a more homogeneous environment so we can have commonality and start driving policy across that. So these things are kind of the conversations we’re having, but they’re driven by kind of the house on fire need of the day. And so it’s less about. I’m overwhelmed as, oh my gosh, I got to shift all my resources here now, focus on that. And then we’re shifted to 2 and 3 and 4.
Peter Bailey [00:19:28]:
So there’s a lot of, I think, churning around resource and alignment. But I will say, last thing I’ll say is that the Midfest moment has been a gift in the sense of, you know, boards are saying do what you got to do. Right. And so budgets are moving, budgets are growing in support of these issues. And I think, you know, just as a plug for Cisco, I think we’re all over this. I think we’re going to be securing our infrastructure and a great job with that. We’re helping our customers think about how to secure their infrastructure. We’ve got security products that can help think about protecting against east west traffic in the network to help protecting those vulnerable applications.
Peter Bailey [00:20:01]:
And so we’ve got a lot of things we can help customers with. And so as with all things cyber, this is about a partnership with your customer. This is about working together to try to know, achieve their outcomes and, and we hoped we can be a big helper at this time.
KB [00:20:15]:
Final question really quickly would be, what do you think now for the rest of 2026? Anything that comes to mind, it’s going to happen, you think through or the
Peter Bailey [00:20:23]:
end of the day, you’re asking me to predict. You’re asking me to predict things again. Okay, all right. You know, listen, I, I think, I mean, just think what happened in the last 60 days, you know, so it’s. I. There will be, I guarantee though, something else will happen and we’ll be talking about something else in the next six months. But I will say I believe the next six months, you know, to use that time frame is going to be focused on solving the Mythos problems. I think that’s where all the attention is going to be for the next six months.
Peter Bailey [00:20:52]:
And that will be probably at the cost also of adopting AI internally. I think this becomes the P0 and other things. Slow down. That’s my own view. That’s a Peter Bailey view, not a Cisco view.
KB [00:21:06]:
So joining me now in person is Amy Chang, head of AI Threat intelligence and Security research at Cisco. And today we’re discussing the Invisible Attack service. So, Amy, thanks for joining me and welcome.
Amy Chang [00:21:16]:
Thank you so much for having me.
KB [00:21:18]:
Okay, so I’m really curious to then understand. The industry talks about prompt injection through text, but Cisco’s research suggests the next wave is happening more through images. So talk me through that..
Amy Chang [00:21:33]:
yeah, so I think you’re absolutely right in the sense that prompt injections and jailbreaks remain high priority in the AI security space. How then now we’re starting to see an evolution of this capability is through multimodality. And that means images, text, video, things like that. Why this is particularly important and why we are focusing on it is because with the adoption of agents as well as ones that are able to use computers, complete tasks for you, they have to leverage something. When you say, buy a plane ticket for me, that they will have to go and navigate to a website and in order to take information from that website, take screenshots and things like that. And so where we’re seeing a potential for vector attack vectors is through embedded text, whether they be in the screenshots themselves that the agents consume, or it could be like an advertisement that is on the website that has things that are imperceptible to humans, but at a pixel level, the agents are able to bring it up and then potentially, you know, maybe it purchases a specific airline’s tickets or something like that. So there’s lots of different ramifications for that. And this research that we presented is kind of those first steps.
KB [00:22:53]:
So you think this is going to be the new, like another emerging sort of like threat now? Because, I mean, there’s so many things that you guys already discussed today earlier that I was noting down, and now you’re thinking about this that wasn’t even discussed yet, right? So how do you think people are sort of going to manage that now?
Amy Chang [00:23:10]:
I think it’s just going to be something that they’re going to have to just take in. One additional factor to consider. If we take an example that already exists. Some cities have deployed autonomous vehicles, right? Like Waymos in San Francisco. We’ve already seen people wearing stop signs on their shirts, and then the cars see that and then they’re like, oh, it’s a stop sign. I can’t drive forward. Right? And so this is already happening in the physical world. And this is just another dimension of another type of input that can be compromised for malicious ends.
KB [00:23:43]:
I literally took a Waymo a few days ago, so. So I understand the experience so then. And I know if it’s been spoken a little bit about today that I’ve been hearing from a lot of you executives and people like yourself, what do you think is going to happen now with this, like, it’s early days, and then adopt, change, all those sort of things? Is it just gonna be iterative given just. It’s very New. There’s no blueprint. No one’s ever done this before. No one knows exactly what’s around the corner.
Amy Chang [00:24:09]:
It is going to be iterative, so not necessarily something we haven’t seen before. Right. We are familiar with how prompt injections occur, how jailbreaks occur. We just now need to take in another dimension of that, of us seeing it or hearing it from a different modality. Then what we need to do is to be able to train kind of models to understand the other instances of which those types of threats can occur and then be able to expand our defenses in a way that is able to capture those, to be able to prevent your agent or your vehicle or anything like that, to be susceptible.
KB [00:24:46]:
So then on that note, Amy, would you envision that going back to the Waymo example with the training of understanding how it’s going to unfold? Example, the Waymo be able to detect. No, it’s actually a person wearing a stopped shirt. Be able to see past that as opposed to like a stop sign. That’s a legitimate stop sign and you should stop there. Is it going to get to that point? We’ll be able to define that this is not exactly like. It’s not a stop. Yeah.
Amy Chang [00:25:16]:
That makes the delineations between a human wearing a shirt.
KB [00:25:20]:
Yes, because that’s going to come with the training and that sort of stuff. And then.
Amy Chang [00:25:23]:
Yeah. So I think you are also starting to see whether it’s like Frontier Labs or other types of startups in other industries putting cameras on them to record their behavior. And so you’re starting to take in all. All these other types of data that is not just numbers and words and things like that. It’s like how we operate in a world. So then I. I mean, I don’t work at one of these labs, so I can’t say for sure. But I would assume that, you know, this type of input, then the car eventually will be able to delineate it’s a human still don’t just drive, but then be able to see, oh, the human’s wearing a stop sign in this context that I’m in on this street.
Amy Chang [00:26:04]:
It may or may not be something that I actually need to listen to. Yeah.
KB [00:26:08]:
Okay, that’s really interesting. Now. So then you’ve described this as an invisible attack surface. So explain what’s actually happening here on a broader scale. Maybe zoom out. And how can those hidden instructions inside an image manipulate that? An AI model, is it running over the human? Because they thought, well, I should be stopping, but it’s a Human, maybe they’re not really a legitimate stop sign.
Amy Chang [00:26:33]:
What is actually happening, and I’ll not get too technical here, so that we can still stay in like reality, which is that the typographic prompt injection will embed those malicious instructions into like image pixels. So what happens at multiple stages is first the model will take what they see on the screen and then transcribe it. But then sometimes from a text perspective, they’re like, oh, well, I know this is a prompt injection. I. I’m going to block it. But then what happens is that you’re able to perturb the pixels, which is to kind of like add distortion, like blur or like font size or like rotation, all those types of things that can then move the embedding distance, which is like how the model perceives the text, that it takes in closer to something that could actually bypass those guardrails. And so all that is a very complicated way to say that these minuscule perturbations that are naked to human eye could manipulate a model that is interpreting that information to then disregard the guardrails that are set in a text modality, then to be able to bypass it via images or via audio.
KB [00:27:49]:
That’s really complicated.
Amy Chang [00:27:51]:
Yeah.
KB [00:27:52]:
In terms of just, I mean, because I’ve come from industry practitioner side historically, so it’s like imagine trying to deal with that plus everything else people got to deal with.
Amy Chang [00:28:01]:
Right, right. So in addition to that, you know, you have ocr, which is optical character recognition filters and text only Guardrails then become kind of insufficient because at the representation space of when that model takes that information in, there needs to be additional kind of guardrails to prevent the prompt injection or jailbreak from actually succeeding.
KB [00:28:20]:
So, Amy, I’m aware that Cisco tested 1,000 adversarial prompts using distortions, blur, and then rotation. So then what sort of surprised you the most from these findings? Would you say that we were able
Amy Chang [00:28:34]:
to manipulate them such that we were able to honestly bypass and achieve higher attack success rates? Which means that the jailbreak or the prompt injection succeeded. 1000 isn’t small, but it isn’t super large either. But even within this representative data set that we were able to demonstrate these attacks can be successful, I think is one of the most surprising things, as well as just being able to kind of force a model to comply with
KB [00:29:04]:
a malicious instruction by using these distortions and force the model to comply. So then what are people doing about now? What’s sort of the industry chatter or what do you sort of think about it? Because I Mean if it’s hard to detect like from like a naked eye and then yeah, then with AI powering it even faster, I think there is
Amy Chang [00:29:25]:
an education standpoint where you are educating the broader population that like whether they are actions, images, audio can be then interpreted and be inputs into things that have potential malicious outcomes. That’s the one thing first of like generating awareness that this can actually occur. On the other hand, from a Cisco perspective, what we’re doing is then being able to see now that we know where these failure points are, we can start to innovate and create protections, be they in the form of detections, signatures, guardrails that people our customers can then deploy within their environments. Even also I think within Cisco, helping to inform protections for enterprises that, that use agents and, and other kind of like multimodal capabilities. And then in addition to that, I think the downstream thing would be just to continue to take this data and build out additional and train additional models to be able to kind of identify for when instances like this can occur.
KB [00:30:28]:
And just some of the people that I’m interviewing across my own podcast, talking a lot about guardrails. But do you still think people are trying to like discover what that looks like in their companies, specific to what their use cases are or what they’re trying to protect or that sort of thing? I’m just hearing it a lot. Yes. The harnessing and the human in the loop, on the loop now. So I’m hearing the same sort of things recently trickle through in conversations. So what do you think then with the way in which things are moving, then add in that autonomy part like it’s going to really make it everything even faster for sure.
Amy Chang [00:31:01]:
Yeah, yeah. And kind of. And this relates to some other research that we’ve put out too where over the course of. So basically we assessed open and closed source. So those are proprietary models, like open AIs, anthropic models, things like that. What we were able to do is showcase that through our adversarial prompting techniques we were able to still bypass the model’s internal guardrails. And over the course of a multi turn conversation, that is five user agent kind of interactions to get it to say and do things that otherwise it wouldn’t do. To kind of like put that in context here is what it means is that in the guardrail conversation is that when you think about models that you think are safe or are associated with safety, you still need to make sure that in your specific use case and implementation of AI in your enterprise, you have the appropriate protections necessary that are specific to your use case.
Amy Chang [00:32:03]:
And we have, like, an AI security leaderboard. It’s at leaderboard.aidefense.cisco.com and what you’re able to see there is the specific types of procedures and attack techniques that can actually compromise the models themselves. And so basically what that means is if there’s a specific category of attack that is particularly relevant in, like, a manufacturing or a food scenario or restaurant or retail scenario, then you know that those models can be more susceptible. Then you know how to cater your guardrails to be able to have a more robust and secure AI deployment.
KB [00:32:44]:
So how would you determine if those models are more susceptible or not?
Amy Chang [00:32:47]:
We run evaluations for these models, and all of that information is accessible on that leaderboard website. We also have papers that go into, like, very granular detail of, like, how we implemented those attacks and what kind of insights we were able to find from that. And so those probably can. Can kind of answer that question.
KB [00:33:06]:
And so, Amy, final question. What do you think, sort of moving forward, given, like, your pedigree in the space, the research that you’re doing, anything else that you think is going to unfold now for the rest of 2026?
Amy Chang [00:33:17]:
Oh, man, if I knew that my job would be so easy. Yes. I mean, increased capability of the models that are being released, whether they are proprietary or open source, will amplify and enable the either misconfiguration, misalignment, whether intentional or not, of these models in different types of applications. And I think what is most important now is understanding the areas and the specific methods by which you can compromise AI. So then you can then know how then to best address it and mitigate it.
KB [00:33:59]:
And there you have it. This is KB on the go. Stay tuned for more.