Enterprises Are Pulling Inference In-House Because Trust Is the Product

The cloud AI honeymoon is ending. Once agents carry credentials, touch customer data, and make decisions, inference stops being a feature and becomes infrastructure.

May 8, 202611 min readBy Jesse Alton

#inference #enterprise-ai #agentic-ai #security #governance #infrastructure

Cloud AI was cute when it wrote emails.

It is not cute when it has credentials.

That is the line. That is where the demo ends and the real architecture begins.

The enterprise AI conversation is finally moving past toy copilots and into the ugly part: identity, permissions, audit logs, data locality, prompt control, token security, rollback, incident response, and accountability.

Good.

I have been saying this for years. The control plane matters more than the demo. If I cannot audit what an agent did, what it touched, what it inferred, what tool it called, what credential it used, and why it made the move, I do not have AI governance.

I have theater.

Inference Is Not a Feature Anymore

RCR Wireless covered F5's 2026 State of Application Strategy report, and the headline is the one that matters: enterprises are bringing AI inference in-house.

Not because CIOs suddenly got nostalgic for data centers.

Not because every company wants to cosplay as NVIDIA.

Because production AI is no longer a chatbot floating off to the side of the business. It is becoming part of the business execution layer.

F5 says 78% of enterprises now run AI inference as a core operation. That number is the shift. Inference moved from experiment to infrastructure. F5's full 2026 State of Application Strategy Report frames the operational reality around hybrid multicloud, model routing, security, and agentic AI as connected infrastructure problems. Not separate vendor pitches. Connected problems.

Once that happens, the questions change:

Where does the model run?
Where does the data go?
Who sees the prompt?
Who controls the token?
What happens when the agent takes action?
Can I prove what happened after the fact?
Can I stop it before it does something stupid?

That last one is not theoretical.

Agents Change the Risk Model

A chatbot is an interface.

An agent is an actor.

That distinction matters.

A chatbot can hallucinate a policy answer. Annoying. Risky. Manageable.

An agent can call an API, update a CRM, provision infrastructure, send a refund, modify an entitlement, open a support ticket, email a customer, rotate a key, delete a table, or approve a workflow.

That is a different animal.

The F5 report gets at this directly. According to RCR, more than 90% of organizations say production-level agentic AI introduces significant new security challenges, including credential stuffing and the difficulty of auditing what agents actually do. F5's own release says 88% of organizations have faced AI-related security challenges, while 98% are preparing for agentic AI systems that need identities, permissions, and guardrails like human users.

Read that again.

Agents need identity. Microsoft agrees. Microsoft Entra Agent ID is now generally available, providing first-class identity and access management for AI agents — authentication, authorization, governance, and security controls. Their security overview for AI explicitly describes Entra as an identity control plane for AI systems, extending governance across human and nonhuman identities. Their agent identity docs say it plainly: identity models designed for human users and traditional applications are not sufficient for autonomous AI systems.

That is Microsoft telling you agents are nonhuman users in your environment.

If that does not make your security team sit up straight, your security team is asleep.

The numbers back this up. A CSA/Aembit survey found that 68% of workers cannot reliably distinguish AI-agent activity from human activity. 74% say agents often receive more access than necessary. 79% say agents create obscure access paths. That is the auditability and least-privilege problem stated in hard data.

And Five Eyes agencies recently warned that agentic AI expands attack surfaces, creates accountability gaps, and behaves unpredictably when deployed without strong controls. That is not a vendor pitch. That is allied intelligence agencies telling you to slow down and build the guardrails.

Agents need permissions.

Agents need guardrails.

Agents need audit trails.

The Prompt Layer Is a Control Plane

The old enterprise security perimeter was built around networks, apps, APIs, and users.

That is not enough anymore.

F5 reports that nearly 29% of organizations identify prompt layers as the top delivery mechanism, while 23% prioritize token layers for delivery and security. RCR calls these new control points at the prompt and token layers. Help Net Security's coverage makes the operational point clearly: application teams are expanding routing, traffic management, identity controls, and observability for multiple AI models and environments.

This is where the next generation of enterprise architecture lives.

Not in another SaaS dashboard with an AI sparkle button.

At the control points:

Prompt routing
Context injection
Tool permissioning
Token scoping
Model selection
Data filtering
Agent identity
Memory boundaries
Action approval
Audit logging

OWASP's Top 10 for LLM Applications treats prompt injection, sensitive information disclosure, supply-chain vulnerabilities, and excessive agency as application-security risks. Their materials on insecure plugin/tool design call out how insufficient access control around external tools can create unauthorized access or execution risks. The prompt and tool layer is part of the security architecture now. OWASP already treats it that way.

That is also why I built Cadderly around coordination, intent recognition, MCP integration, and agent-to-agent orchestration. The agent is not the product. The coordination layer is the product. The policy layer is the product. The audit trail is the product.

The Cloud Honeymoon Is Ending

I am not anti-cloud.

That would be stupid.

I have spent too much of my career shipping real systems across cloud, hybrid, and government environments to pretend the answer is one ideology. Cloud is incredible for speed, elasticity, managed services, and global reach.

But the cloud AI honeymoon is ending because the risk profile changed.

When inference was a playground, sending everything to a hosted API made sense.

When inference becomes part of customer service, fraud detection, logistics, underwriting, health workflows, public sector operations, finance, and regulated enterprise systems, the calculus changes.

Now I care about:

Data residency
Latency
Cost predictability
Model governance
Vendor concentration
Regulatory exposure
Incident response
Offline continuity
Credential boundaries
Auditability

This is why in-house inference is not a vanity infrastructure move.

It is what happens when AI grows up and gets dragged into the same governance reality as every other production system.

Everything is a business.

Do your job or I will replace you.

That applies to models too.

The Database Deletion Story Was Not an Edge Case

Dark Reading recently covered the ugly version of this problem in If AI's So Smart, Why Does It Keep Deleting Production Databases?

The point was not that one model went rogue.

The point was that the industry is plugging agents into production before the safety architecture is mature enough to support them. The failure mode is premature production integration, missing safety controls, and insufficient testing. Not model intelligence.

That is the whole game.

The model is not the only risk.

The integration is the risk.

The token is the risk.

The permission boundary is the risk.

The missing confirmation step is the risk.

The absent recovery plan is the risk.

The unaudited tool call is the risk.

I do not care how smart the model is if I cannot constrain it.

I do not care how good the benchmark is if the agent can drop a production database because somebody handed it an overpowered credential.

That is not AI strategy.

That is negligence with a transformer attached.

Strategy Is Not Output

There is a related problem in the way companies talk about AI.

They confuse output with strategy.

The Drum piece, The thinking machine: what actually happens when AI gets good at strategy, makes a useful distinction: most AI usage is still about making more stuff faster. More copy. More decks. More ideas. More noise.

That is not strategy.

In enterprise systems, the same mistake shows up as more automation without more accountability.

More agents.

More tool calls.

More integrations.

More autonomous workflows.

But no control plane.

That is a shit recipe for success.

The strategic question is not, "Can I automate this?"

The strategic question is, "What should be allowed to act, under what conditions, with what permissions, against what systems, using what evidence, with what human oversight, and with what audit trail?"

That is strategy.

That is governance.

That is infrastructure.

Telcos Are Back in the Conversation

One interesting angle in the RCR piece is F5 positioning telecommunications companies as critical partners in this AI infrastructure shift.

That makes sense.

If inference is moving closer to enterprise environments, users, devices, factories, hospitals, ports, campuses, and regulated networks, then connectivity providers matter again. RCR's F5 coverage ties in-house inference directly to traffic routing, security, latency, and distributed infrastructure challenges. The inference workload does not sit still in one region. It follows the data, the user, the regulation.

Edge inference is not just about speed.

It is about control.

It is about keeping sensitive workloads closer to where decisions happen.

It is about reducing round trips to distant hyperscale regions when the workload cannot tolerate latency, exposure, or dependency risk.

I do not buy every telco AI pitch. A lot of it is recycled 5G marketing with a model slapped on top.

But the underlying infrastructure point is real.

AI production will be hybrid.

Some workloads will run in hyperscale cloud.

Some will run in private cloud.

Some will run at the edge.

Some will run on-device.

The companies that can route work intelligently across that mess while preserving policy, identity, observability, and trust will be the ones still standing when the hype clears.

The New Enterprise AI Stack Is Boring on Purpose

The future of enterprise AI is less magical than the keynote demos.

Good.

Production systems should be boring.

NSA, CISA, and allied agencies recommend validating AI systems before deployment, enforcing least privilege, securing APIs, using defense-in-depth, and maintaining logging and monitoring. That is the production checklist. It reads like any mature infrastructure security program. Because that is exactly what enterprise AI governance is.

The stack I care about looks like this:

Identity for humans and agents
Scoped credentials by task
Prompt and context governance
Model routing by risk and cost
Tool registries with permission boundaries
Human approval for destructive actions
Immutable audit logs
Evaluation harnesses before deployment
Red-team testing for agent workflows
Rollback and recovery plans
Data controls at every boundary

None of that will go viral on LinkedIn.

All of it will keep the business alive.

This is the same lesson I learned the hard way building companies, shipping government systems, modernizing large platforms, and watching Magick ML fail spectacularly after doing a lot of things right and a few important things wrong.

The demo is never enough.

Distribution matters.

Trust matters.

Operations matter.

Control matters.

Bring Inference Home When the Risk Demands It

I am not saying every company should run every model in-house.

That is cargo cult infrastructure.

I am saying the decision should be based on risk, not fashion.

Run hosted models when the workload is low-risk, non-sensitive, elastic, and speed matters more than control.

Bring inference closer when the workload touches:

Regulated data
Customer records
Financial decisions
Healthcare workflows
Critical infrastructure
Internal credentials
Proprietary strategy
High-volume cost centers
Autonomous actions

The moment an agent can take action with business consequences, inference becomes part of your security architecture.

Treat it that way.

The Real Question

The real question is not whether enterprises will bring inference in-house.

They already are. 78% of them, according to F5.

The real question is whether they will build the control planes to make agentic AI safe, observable, and useful — or whether they will bolt agents onto production systems and act surprised when something catches fire.

I know which one I am betting on.

The companies that win will not be the loudest AI adopters.

They will be the ones that can prove what their agents did.

If you are building agentic systems, inference platforms, MCP integrations, or enterprise AI governance, I want to talk. Bring the real architecture. Bring the failure modes. Bring the scars.

📍 Posted directly to jessealton.com