[00:00:00] Vincent: Nvidia CEO said it we entering the inference era, but it’s a different game because it’s not about building. It’s about scaling, protecting, and measuring the ROI of this infrastructure. It’s all about tokens.
[00:00:21] KB: From KBI Media, I’m Karissa Breen and this is KBKast.
My guest today is Vincent Laverne, Global AI Leader at F5, who spends his days inside the AI infrastructure decisions of enterprise customers across the globe. We talk about the gap between AI hype and AI reality, why most AI experiments are quietly failing, and the unresolved question of whether companies racing into inference actually know what they’re measuring.
Before we get into it, do me a favor and hit follow wherever you’re listening. It genuinely helps the show reach more people who need to hear these conversations.
All right, let’s get into it.
So, Vincent, I really want to start today with everyone keeps talking about AI like it’s already transformed businesses overnight. But when you sort of get behind the hype and you dig a little bit deeper, like how immature is the market in reality? How do you see it?
[00:01:23] Vincent: I agree there is a lot of noise, a lot of news.
I mean, wherever we switch on tv, read article, everybody’s talking about AI. And I think it’s important to differentiate between the buzz, I would say, and the reality. But there’s a lot of things happening what I discuss with customers.
There is a spectrum, I would say, of AI maturity within our customers, some of them already quite advanced. And this is what we see sometimes on the news, et cetera.
But as well as many that are experimenting, there’s different maturity phases. I would say it start with customers, and we started like this probably three, four years ago or three years ago at F5 with leveraging AI assistants to help write emails, to help us with accelerating tasks, repetitive tasks, et cetera.
And this is where a lot of customers that I’m speaking with internationally are doing in the last two years. I see a lot of customers experimenting generative AI and how they can develop generative AI applications for their own use case. And there’s been a lot of discussion here finding the right use case and maybe we’ll get to it later on.
I think customers have learned when they moved, that’s probably 10 years ago, to Public Cloud, some of them moved everything or claimed they would move everything. And I think they’ve learned a lot from it. So there’s a bit more, I would say, reflection from them finding the right use case, experimenting it. And I think this is in the last two years where A majority of customers have been experimenting and a lot of project failing. Maybe we’ll get into it later on, but it’s interesting now to see that we are reaching an inflection point where a big number of customers have found the use case, the interesting one to implement AI within their company, have been selecting the model that they want to leverage the infrastructure, and they’ve been training those models with their own data. And now we see an inflection point where a number of customers are going into production. This is what the industry is calling AI inference. So when AI workloads will respond to generative AI requests, separating it from training phase, which was the past two, three years. And it has been also called out, interestingly by Jensen Heng Nvidia CEO in the last GTC conference, which is the flagship conference back in March in San Jose, that we are now entering the inference era, where we’re going to see a big volume of customers moving their workloads in production and so moving into inference.
There’s also the next phase, which is agentic AI, where it’s taking a lot of space in the news, in the media, et cetera, where we are seeing a number of customers already experimenting it and would be the next phase probably reaching a maturity in 2028, 2029, and the next phase being a physical AI and where very few customers already there. I’ve had the pleasure, I would say, to experiment it myself. Going and attending a GTC conference in San Jose with my first autonomous taxi experience, which is, I mean one of the physical AI implementation.
So there’s some reality on this, but there are very few or selected use cases leveraging physical AI. So you can see there is a big variety.
The spectrum is big of implementation, but what I see in my conversation with customers, most of them are been implementing AI assistant and are about to move into inference with generative AI workloads. And this is what I see. So there is a reality and I would say we need to differentiate from the buzz that AI is a big topic, it’s not just a technological one.
Funny enough, when I took this role about a year ago.
Since then I’m having not just technological or architectural discussion with my customers, but as well as sometime philosophical.
It’s a topic that is bringing a lot of angst as well as opportunities. But I think it will stay. And I think like in every new technology, disruptive technology, there’s always a phase where we learn how to use it, but I think this one will be here to stay.
[00:06:51] KB: Okay, so there’s A couple of things in there before we move on. So with the hypes, with any sort of heart, with technology as we know, it sort of hypes up and then it sort of dissipates. But now as you’re saying that we’re moving into this inferencing stage and then agentic and so on and so on. So do you think the hype will then stay there? And then I guess underneath that, with the hype, there’s obviously people to your words, there’s that angst and then there’s those opportunities.
So do you think a lot of that angst will dissipate over time or.
And it’s so like relatively new for a lot of companies. Right. And like you said, people are experimenting, things are failing. You know, I’m speaking to people on this show asking even around the value of like AI in terms of roi, are they using it correctly? Have they overextended? There’s all these sort of conversations. So what do you think about that? Like, will we continue to see this hype cycle for a while now? Because there’s more things coming down the line.
[00:07:47] Vincent: There’s two, two different things. The customers.
And I’ve learned, as I said before, from the move to public clouds. So instead of having a bold statement that I’m going to move all, I’m going to transform all of my application. Leveraging AI, they are really looking at what AI can bring to them, taking wise decisions, of course, as any new technology and disruptive technology.
As they are experimenting, there’s a number of projects or experiments that are fading.
I think there are multiple studies on the topic.
I’ll call out one from the MIT which was saying that 80% of those initiatives are failing, but part of it is normal. It’s a complex new technology which encompass many dimension.
What data do I train my model on?
Who has access to data?
There’s a say that there is no good AI without good data.
Also who has access to it?
What data do I train my model on? The security aspects which are important.
And most of those initial experiments, it’s one of the top reason why they’ve been failing is, is because of the access to data and the security of that data that the AI workloads are using.
But I think I would maybe take a very pragmatic or very simplistic view. When we were younger or very young babies, we knew that we to progress we had to stand up and walk. And before having a stable walk, we felt a lot. But we continued and I think this is what customers are doing. They are learning from their failures and continuing those experiments. For most of them, of course, I mean it was a loaded question. There is the debate about the ROI of those AI experiments and I think those experiments are demonstrating if they are valuable. But the true ROI of those projects will be when they go inference, when those workloads will go in production.
Nvidia CEO said it we are entering the inference era, but it’s a different game because it’s not about building, it’s about scaling, protecting and measuring the ROI of this infrastructure. It’s all about tokens. I can explain a little bit more after but you measure your artificial intelligence success by how much intelligence it is creating. And the measure of this intelligence is the token. The cost to generate those tokens will depend and as you scale, you need to manage the cost of your infrastructure. Of the tokens that are being generated when going inference, you’re going to measure your return on investment is how much intelligence you’re creating versus how much it costs to generate it.
Those infrastructures, specific infrastructures for AI leveraging GPUs, leveraging specific hardware which can be expensive.
You need to make sure that to generate this intelligence, those tokens, they are used wisely and this is where humbly we think at F5 we can help our customers making sure that this AI infrastructure is optimized, is used wisely and there’s no wasted compute generate the intelligence of those AI workload.
[00:12:00] KB: Yeah, this is really interesting because that’s similar now to what I’m starting to hear come through in terms of what people are saying in the market. People up yourself so then maybe pushing down a little bit more, would you say again, I know you’ve sort of touched on this, but I want to get into this a bit more.
Are we as in the industry learning to walk phase of AI right now where companies are failing publicly, burning cash, pretending everything’s fine. And I know you sort of touched on it, but I’m just all of a sudden everyone was all on board and now I’m sort of, I’m hearing different stories about we’ve, you know, as I mentioned before, overextending and now people are second guessing it. And I know you said we’re moving into inferencing stage. So do people just have to wait it out and see once they’re in that stage to be like yes, this was a good investment.
[00:12:53] Vincent: For example, in the experiments, as mentioned before, some are failing and some will continue to iterate until they find the correct use case. As well as the correct ROI for them. Some will stop here not finding the real value, but what we are witnessing and humbly from the conversation that we having with our customers, we seeing an increasing number of customers moving into they are ready that have a planning to have an ROI with capex that they are going to invest on this infrastructure. I think this is where they and they are deciding to go inference to go in production.
And this is where they need to constantly measure that estimated ROI is proving to be true when they are going inference. So they are constantly measuring that the when scaling those workloads that the ROI keeps true.
So but basically particularly Nvidia. But others are explaining a concept about the token economy.
If you want the currency of AI and you need to maximize the number of tokens you can manage with the current infrastructure that you have.
And I give you an example maybe to illustrate it, you’ve probably been using on a regular basis a chatbot, a generative AI AI application where you’ll put in a prompt a very simple question. What is the capital of Paris? It’s going to be tokenized. It’s going to be if you want split into a number of tokens. And this question in input to the model is probably around 10 to 12 tokens.
The output from the model will be Paris, 2 tokens. If at the same time Karissa, you are sending request a prompt, can you generate a 30 minute video summarizing the history of Paris? If those two prompts are ending on the same server, your first prompt about what is the capital of Paris? Will probably need to wait long before the video is finished because a generation of the 30 minute video is finished because this will consume a lot of compute, a lot of token. So making sure that you are spreading your inference requests to the best suited server will help you generate more tokens, will help you reduce the time for you as a user to wait to get your response. This is a KPI that is called time to first token. You will also not waste middle capacity where some of your servers are waiting for requests while others are working at 100%. So this is making sure that your infrastructure is optimized so that you use the maximum capacity it can generate so to keep your ROI as it was planned to be. So that’s the type of things that when customers are moving into inference, they need to be monitoring on a daily basis, on a regular basis to make sure that to reach their goals they don’t have to add more capacity, which will then affect their return on investment.
So scaling inference and that’s not me saying it again. I will quote Jensen Heng, Nvidia’s CEO, who mentioned it at gtc, Moving to inference is a different game. Monitoring your tokenomics, the different KPIs to generate for your infrastructure to generate tokens is a very important component to make sure that you keep the ROI that you plan.
[00:17:23] KB: We’ll come back to that after a quick word from our sponsor.
Handling sensitive health data you already know security and compliance aren’t optional. Whether it’s ISO 27001, SoC2 or GDPR, Vanta helps you build trust while staying focused on patient outcomes. Their platform automates up to 90% of the work, so you can hit your compliance goals faster and scale safely. Visit vanta.com/KBKast that’s V A N T A.com forward slash KBKast to learn more.
So then on that note, I know you mentioned it before, constantly measuring. Do people sort of know though what they’re measuring?
And I know we’re sort of moving towards the tokens and the new inferencing is a different game. But for the average sort of company enterprise out there, is it still like they’re trying to figure it out? And then with that comes maybe trial and tribulation along the way? Or how do you sort of see that there’s no sort of like rule book to be like, well, this is sort of what you’ve got to measure against.
[00:18:29] Vincent: For example, yes, there are a number of metrics that customers are measuring when they go inference, and I think they are very well equipped. There’s a number of solutions out in the market allowing them to measure. But maybe give a pragmatic example. Probably 20 years ago when the Internet was booming, one of the way to differentiate two similar Internet services was how many milliseconds do I need to wait before an online bookstore would give me an answer to my request to have the latest Harry Potter. And if I needed to wait five seconds, that was at that time unacceptable and I would probably move to the next available online bookstore. That’s a similarity that we are seeing now with customers moving into inference.
They want to make sure that their service is available, it is responding quickly as well as using their infrastructure to the maximum.
[00:19:40] KB: It’s going to obviously mature and it’ll just become relatively normal. Like, like nowadays people aren’t going to wait like five seconds for a site to load, right? So obviously it’s matured and evolved over the years as things have gotten better. I’m assuming it’s the same sort of process, but Obviously a lot faster. Just the way in which things are going now.
[00:19:57] Vincent: Oh yeah, absolutely. And the compute side is every new generation of AI specialist computer is able to do more or tokens more throughput, etc. Etc.
But it’s also, I would say the IT teams managing those environments becoming more knowledgeable, the models themselves evolving, consuming less resources, being more accurate, etc. So, so I think it, IT keeps improving. I mean all the different elements of this stack are improving to be more efficient. Yes.
[00:20:41] KB: Okay, I want to then perhaps zoom out for a moment and talk about future winners in like AI.
So what I mean by that is it’s not necessarily the company with the most like GPUs for example, but the one who can orchestrate workloads most intelligently. Because that’s where you said it’s all about the good data which then has, you know, inferences that data then as well obviously once it trains it. So is that sort of the winner long term, would you say?
[00:21:15] Vincent: We’ve touched already on a number of those points, but I would say that the winners we’ll start by companies that will wisely think about what is the correct use case. Again, I take my analogy of the cloud. We’ve heard many companies probably 10 years ago saying I’m going to close my data center, I’m going to move everything into the cloud. And we’ve seen those and the primary reason was costs. We’ve seen a number of those repatriating. So first of all not closing their data center and repatriating.
I think there is a similarity with customers really being wise finding the correct AI use case where it makes sense.
And this is why we’ve been seeing the early adopters in the news, etc.
Going very quick and one or two particular use cases. But the vast majority of customers thinking and finding the right use case, that’s I would say item number 1, 2. Once you find your use case, the one that is relevant to your specific context where AI is going to help you is having the correct data governance, making sure that you have the correct process to extract the appropriate data that you want to train your model on and potentially transform it before you train your model on this data. And I give an example, we’ve seen a number of. Back to my.
What I mentioned earlier that MIT mentioned that roughly 80% of those AI experiments sometime fail.
One of the top reason is data and it’s an example here. Okay, fictitious company that is making soda. You don’t want the secret recipe of your soda to be disclosed by the model.
So you need to make sure that this data is never responded or is not even trained on by the model, making sure as well that you have the correct people having access to this data. If you are launching an HR generative AI application, or a generative AI application that has access to HR data, it can be super powerful. But you need to make sure that when Vincent is asking how much his manager is being paid and how much he was increased in the last year, I shouldn’t be given an answer here. So who has access to data is also an important phase. So the data governance, the third point for me are the customers that will be successful will be the one, as we talked just a couple of minutes ago, who will be able to manage the cost of their infrastructure, when they will be scaling their infrastructure, how do they optimize it so that when the number of tokens or the number of inference requests and response increases, the cost stays linear and does not explode?
The security aspect, when you are moving to inference, you need to make sure that those workloads are protected and AI is generating a new attack surface, new types of attacks and that goes beyond just vulnerabilities, etc. It Sometimes you don’t want a certain type of questions to be answered by your model.
And the last but not the least I would say is the AI talents. AI is a new technology that can be complex data scientists, but even in the infrastructure you’re leveraging Kubernetes, APIs, etc.
And managing those talents, the one that will be able to help you define, build, manage, protect those workloads is an important aspect as well. And last but not least, it’s not just the people building AI within your company, but it’s as well the people using it. And then we are entering another aspect is we mentioned earlier on that AI is sometimes generating a lot of debates. People are seeing the opportunities that AI can bring, but it also brings angst to some people and sometimes conspiracy theories.
And you need to make sure that when you build those workloads that the employees will be using it.
Okay? There’s also a lot of education to your workforce on how to correctly leverage AI so that it can enhance their work, help them and also sometimes combat. Maybe a strong word, but defend on the fact that AI is not here to replace your work, your job, but it’s here to help you.
[00:27:01] KB: When you were talking, I don’t know about you, but every day like I’m reading different stuff in the news etc and then I was reading something the other day, like two days ago and it was saying that certain companies have already like used up all their AI budget in like the first quarter. So then I really want to know are people sort of maybe regretting those decisions now? Because like you said, the example around moving everything to the cloud and now people have changed their ways. And I know the pendulum sort of swings white really hard to one side and the other side and it sort of falls in the middle. But I just really want to know that answer because I don’t know. Have you ever sitting there regretting it, going maybe we didn’t do the right thing? Do you think there’s a little bit of that going on in there?
[00:27:44] Vincent: I wouldn’t say so. I would say that this is for me a testimony that more and more customers are moving to inference. At F5 we are running a report every year that is called the State of Application Strategy Survey. In a nutshell, all what we do at F5 is for applications. But those applications are evolving back in the Internet days and an application now is completely different.
And now you have AI workload, et cetera. And I was really surprised by the result of our 2026 survey where we’ve asked a thousand plus customers across the globe and I think it 77% of them respondent that they are already into managing AI with inference. They are not in the training phase anymore. They are in the inference. So that’s a good testimony of what Mr. Jensen was saying that we entered the inference era and to encourage people to use AI, AI and inference. I’ll mention also something is some employees are given as part of their package when they are joining a company and particularly in the west coast in the US in the Bay Area developers, when they are joining a company they have their package but they are also given a token budget which is encouraging them to leverage AI. And so back to your question. I think it’s when you are encouraging your workforce to use AI, you need to make sure you also have the controls, the controls that you don’t go over your budget. And this is where I mentioned it earlier, some customers are also leveraging agents, AI agents that can not only respond to not respond to questions like generative AI is doing, but also reason and act on your behalf. And this is where a number of customers have seen the explosion of their token budgets. So that’s, I think this is where IT professionals are learning and needs to put some one cross control on who is using AI and controlling this. And sometimes you have many AI initiative within a big organization with different line of business, implementing different models, different I would say strategies and there needs to be a governance here from the company and how to make sure that you control the cost but you control who has access to what models, etc. Etc. So that you can have a better control on all of this.
[00:30:34] KB: So then Vincent, final question moving forward, how do you think the year is going to unfold? And I know like each day like things change and then like each time I’m interviewing someone, like things sort of change and I know like things change a lot but it’s just more from as of this moment like what do you sort of see given your role, what you’re sort of seeing in the market, anything that you can sort of share for people who are listening inference
[00:30:57] Vincent: is what’s going to happen in the next two to three years.
The next phase, which is really interesting as well is the use of AI agents. If I’m going to use another very bad analogy, but think about Jarvis in Ironman, when Ironman is asking can you repaint my body armor into red? And et cetera. And it’s sinking and doing it and advising him a couple of hours after when this is done. This is the new type of and sorry for my bad analogy but that’s the the type of usage that we will see in three to five years.
There’s already customers going into it. As an example, OpenClaw, the open source AI agent is the fastest growing open source project these days and faster that faster than Linux itself in terms of growth.
So which is a testimony that a number of customers are looking into it and I’ve had discussion with a number of customers in specific industries that are already looking at that have already implemented experimentation on AI agents.
That’s what is going to be the next future which is posing a number or asking ourselves a number of questions with when you ask your AI agent an intention to do something, it will act on your behalf.
So when mistakes are being done, how do we stop an agent from doing something wrong? A kill switch somehow? How do you make sure that what is the identity of an agent?
There’s a number of questions and I can expand for long but that it poses that the industry will need to solve and there are a number of solutions existing already on the market before this becomes widely deployed and scaled and that’s an interesting part as well but I see it is the next phase of AI is agentic and it’s already started. The last one is physical AI which is a little bit further down the line and it’s not, I would say, valid for every single industry from what I can see. But I can be wrong. We started the conversation. I mentioned my autonomous taxi. That’s one of them. We start to see also AI into warehouse, where it can physically help humans for heavy loads or to execute a number of tasks. But that’s where it’s industry specific, I would say.
So I don’t foresee it to be valid for every single industry. But yeah, definitely. Agentiq will be the next phase we’re gonna see in the next three to five years and space to watch. I think it will generate a lot of interesting debate as well.
[00:34:11] KB: That was Vincent Laverne. What I’m still turning over from that conversation is the MIT figure he raised on AI initiatives failing. While looking into it, I found the actual number, the Most recent Gen AI divide report is 95%. Even higher.
And what really struck me is Vincent’s point that most of these values trace back to data access and security rather than the technology itself. So if you’re a CISO or board director listening to this, the question to bring back to your exec team this week is whether your AI ROI model was built for the training phase you just left or the inference phase you’re already in.
I read every reply. If you got some thoughts on this one, send me a message on LinkedIn.
KBKast – Cyber For The C-Suite.