Back to Home

The AI Infrastructure Layer Your Organization is Missing

Edge computing, serverless, WebLLM, MCP, A2A, AG-UI. What they are, why they matter, and what your org can do with them right now.

March 19, 202613 min readBy Jesse Alton

Most conversations about AI in the enterprise stop at the model layer.

Which LLM should we use? Should we go OpenAI or Anthropic? Can we fine-tune something?

Those are great questions, but not the only questions you should be asking.

The real competitive advantage in AI right now is not which model you pick. It is how you build the infrastructure around it. Where computation happens. How agents talk to each other. How your UI responds to user intent in real time. How you keep sensitive data on-device and out of third-party servers.

I have been building in this layer since 2022. First at MagickML (featured at Google I/O 2023 alongside LangChain), then at Virgent AI, and personally through Cadderly鈥攎y own production multi-agent platform I use daily as an AI-powered Zettelkasten. I have a long track record in open standards for interoperability through the Open Metaverse Interoperability Group, a W3C working body I co-founded and co-chair. That background is why I adopted MCP and A2A early, before most teams had heard of either.

This post is my attempt to explain the infrastructure layer clearly, as an extension of this recent case study for Virgent AI's customers. What these technologies are. Why they matter. What you can actually do with them in your organization.

Edge Computing: Computation That Travels With You

Most people think of AI as something that happens in a data center somewhere. You send a request, it goes to a server, the server runs a model, you get a response back. That model still describes most enterprise AI deployments.

Edge computing changes that. Instead of sending everything to a centralized server, edge computing moves computation closer to where data is generated and used. That might mean a device in your hand, a browser running on your laptop, a server at the edge of a CDN network, or infrastructure physically located inside your building.

Why does that matter?

Latency. Computation that happens close to the user is faster. If your AI agent is running inference at a Vercel Edge node 20 milliseconds away instead of a data center 200 milliseconds away, your users feel that.

Privacy and compliance. Data that never leaves the device never gets exposed. For legal, healthcare, government, and financial organizations, keeping sensitive data on-device is not a nice-to-have. It is a requirement.

Resilience. Edge systems can operate when centralized infrastructure is unavailable. Air-gapped environments, low-connectivity field operations, sovereign cloud requirements. These are all edge problems.

Cost. Moving large volumes of data to centralized inference is expensive. Pre-processing, filtering, and routing at the edge can dramatically reduce what actually needs to leave the network.

At Virgent AI, we have built edge-native AI workflows for clients across regulated industries. One of my favorite examples is using WebLLM to run compliance pre-screening in the browser before a prompt ever touches a server.

Serverless: Infrastructure That Scales Itself

Serverless does not mean there are no servers. It means you do not manage them.

In a traditional deployment, you provision servers, configure them, keep them running, and pay for them whether or not they are being used. In a serverless architecture, you write functions, deploy them, and the infrastructure layer handles everything else. It scales up automatically when there is demand. It scales back down, including to zero, when there is not.

For AI workloads, serverless is important because AI usage patterns are often spiky and unpredictable. A customer support agent might handle 200 conversations at 9am and three at midnight. Paying for servers that can handle 200 users at all hours is wasteful. Serverless lets you pay for what you actually use.

I build almost everything on Vercel, which offers a serverless and edge-native deployment platform that integrates cleanly with the AI stack I work in. Every deployment we ship at Virgent AI goes out with Vercel Edge Functions handling routing and latency-sensitive operations, and serverless functions handling heavier processing. It is faster to ship, cheaper to run, and easier to scale.

WebLLM: AI That Runs in Your Browser

WebLLM is a project from the MLC AI team that lets you run large language models directly in the browser, using WebGPU. No server. No API call. The model runs locally on the user's device.

That sounds like a neat trick. It is actually one of the most strategically important things happening in AI infrastructure right now.

Here is how I have used it in production at Virgent AI.

We had a client in a compliance-sensitive industry. They needed AI assistance for their team, but they had real concerns about sensitive data leaving the network. Instead of building an elaborate server-side filtering layer, we built a WebLLM pre-screening step that runs entirely in the browser. Before a prompt is sent to any external model, a small local model checks it for PII, compliance red flags, and data leakage risks. If it passes, it goes through. If it does not, the user gets a prompt asking them to revise.

The result: AI assistance that actually works in a regulated environment, without a separate compliance review process standing in the way.

WebLLM models are not as capable as GPT-4 or Claude. They do not need to be. For triage, pre-screening, intent recognition, and lightweight inference, they are exactly right. You use the right model for the job, not the biggest model for everything.

This is the multi-model principle in practice. You can see how we implement it at virgent.ai/services/data/multi-model-architecture.

MCP: A Standard Language for Agents and Tools

Model Context Protocol, or MCP, is an open standard released by Anthropic in late 2024 that defines how AI agents connect to external tools and data sources. Think of it as USB for AI. Before USB, every device had its own connector. MCP gives agents and tools a universal interface.

Before MCP, if you wanted your AI agent to read from your database, query your calendar, or push to your CRM, you had to write a custom integration for each one. Those integrations lived in your codebase. When the tool changed, you updated the integration. When you switched agents, you rewrote everything.

MCP changes that. A tool that exposes an MCP server can be connected to any MCP-compatible agent. You build the integration once. Any agent that speaks MCP can use it.

I built MCP auto-deployment into Cadderly within two weeks of Anthropic releasing the spec. You can deploy MCP servers from a browser UI in under three minutes. That is not a demo feature. It is in production, used daily.

For your organization, MCP means your AI agents can safely and standardly access the systems they need: your knowledge base, your ticketing system, your calendar, your data warehouse, without building brittle custom integrations for each one. It also means you can enforce access control and auditability at the protocol level, not as an afterthought.

We build MCP-native agent architectures for clients. More at virgent.ai/services/agents/mcp-a2a.

A2A: Agents That Talk to Each Other

Agent-to-Agent, or A2A, is an emerging protocol, with Google's implementation being the most visible, that defines how AI agents coordinate with each other across systems and organizations.

MCP handles the connection between an agent and a tool. A2A handles the connection between one agent and another.

In a multi-agent system, you might have a coordinator agent that receives a task and breaks it into subtasks. It delegates one subtask to a research agent, another to a writing agent, another to a scheduling agent. Those agents may live in different systems, be built by different teams, or run on different models. A2A gives them a standard way to hand off work, share context, and report results.

I built a custom A2A coordination protocol into Cadderly before the Google spec was widely published. The implementation handles task delegation between admin agents (code execution, research, creative work) and user-facing agents, with real-time progress tracking and visual workflow representation. It is the kind of thing that required significant custom engineering in 2024 and will be a commodity capability by 2026.

For your organization, A2A matters when your AI ambitions grow beyond a single agent answering questions. Once you are orchestrating complex workflows, routing work across departments, or integrating with partner systems that also run agents, you need a coordination layer. A2A is that layer.

AG-UI: Interfaces That Build Themselves

AG-UI, or Agentic UI, is an emerging pattern and protocol for user interfaces that respond dynamically to agent behavior and user intent. Instead of building static screens with fixed components, you build interfaces that generate and adapt themselves based on what the agent knows and what the user is trying to do.

Look at the Virgent AI site. The agent widget at the bottom right is powered by our multi-model AG-UI implementation. It is not a simple chatbot overlay on a static page. It adapts its response style, surface area, and behavior based on context.

The practical application for most organizations is more modest but still significant. Instead of building a rigid form for a process, you let an agent guide the user through it conversationally, generating the UI it needs as it goes. Instead of a fixed dashboard, you build an interface that surfaces the information the user is most likely to need given what they are currently doing.

This is early but moving fast. We are already deploying AG-UI patterns in client work. You can see examples at virgent.ai/agents.

How These Things Work Together

None of these technologies is interesting in isolation. What makes them powerful is how they compose.

Here is a real pattern we have shipped at Virgent AI.

A user opens a web application. WebLLM loads in the background and pre-screens their inputs for compliance. A serverless function on Vercel Edge handles routing. Their request reaches a coordination agent that uses A2A to delegate to a specialized research agent and a specialized writing agent. Each of those agents pulls context from their organization's knowledge base through MCP-connected tool servers. The interface updates in real time using AG-UI patterns to reflect the agent's progress. The whole thing runs in under ten seconds with no data leaving the client's approved infrastructure boundary.

That is not a prototype. That is a production architecture we have deployed.

What Cadderly Is

I want to be direct about what Cadderly is and why I built it, because it represents my own attempt to live at this infrastructure layer and understand it from the inside.

Cadderly is a production multi-tenant AI platform I built as a solo engineer. It supports 30+ LLM providers including WebLLM for browser-based inference, MCP auto-deployment from a browser UI, custom A2A agent coordination, and a three-tier memory system combining short, medium, and long-term personalized memory paired with embeddings. I built my own intent recognition layer on top of it, and I use it daily as an AI-powered Zettelkasten, a personal knowledge management system.

I built it because I wanted to understand these protocols deeply. The best way to understand infrastructure is to build on it, break it, and rebuild it. Cadderly is my production laboratory.

It is also open in its approach. My work at OMI, developing open standards for interoperability in virtual worlds, shapes how I think about AI infrastructure. Interoperability is not a nice-to-have. It is what prevents vendor lock-in and allows ecosystems to grow. MCP and A2A are interoperability standards. My enthusiasm for them is not incidental.

Cadderly is my open love letter to agentic systems. It codifies the Virgent way of building agentic solutions: multi-model by default, edge-first, protocol-native, interoperable.

What Your Organization Can Do Right Now

You do not need to rebuild your stack to take advantage of any of this. Here is where to start.

Audit your data boundaries. Understand what data your current AI tools are sending where. If the answer is unclear, that is a risk. WebLLM and on-device inference exist precisely for situations where data needs to stay local.

Inventory your integrations. If you are building custom tool integrations for every AI capability you deploy, you are accumulating technical debt. Start evaluating MCP-compatible tooling so your next integration is the last one you need to write.

Think in agents, not prompts. A single prompt to a single model is a starting point. The real leverage comes from multi-agent workflows where specialized agents handle specific tasks and hand off to each other. A2A makes those handoffs reliable.

Prototype early at the edge. Vercel and similar platforms make serverless and edge deployment accessible to small teams. You do not need a platform engineering organization to start deploying AI at the edge.

Demand interoperability. When evaluating AI vendors, ask whether their systems support open protocols. A system that locks you into a proprietary integration layer is a liability, not an asset.

The Virgent Way

As a solo engineer, I have been able to harness these technologies and approaches to codify what I call the Virgent way of building agentic solutions:

Multi-model by default. Use the right model for the job. WebLLM for pre-screening. Claude for reasoning. GPT-4 for complex analysis. Together AI for speed. Route intelligently.

Edge-first architecture. Start with what can run close to the user. Move computation to centralized infrastructure only when necessary.

Protocol-native design. Build on open standards. MCP for tool integration. A2A for agent coordination. Standards that will outlast any single vendor.

Production-ready from day one. No prototypes. No proofs of concept. Every system we ship is built to scale, monitor, and maintain.

This approach has delivered measurable outcomes: $120K+ in annual savings for one client, acquisition thesis definition for another, 50% ticket reduction across multiple deployments. ROI in under 60 days.

Where to Go From Here

We have live demos of production agents, multi-model orchestration, and agentic workflows at virgent.ai/agents.

Our full services across agents, edge deployment, multi-model architecture, and MCP and A2A implementations are at virgent.ai/services.

If you want to talk through what any of this looks like for your specific situation, the first call is always free: virgent.ai/contact.

The infrastructure layer is being built right now. The organizations that understand it early will have an advantage that compounds. The ones that wait will spend the next several years catching up.

Jesse Alton is the founder of Virgent AI, co-chair of the W3C Open Metaverse Interoperability Group, and an adjunct professor of Product Management and AI at Maryland Institute College of Art. He is a champion of open protocols for interoperability, and has been building production agentic systems since 2022. Virgent Case StudiesPersonal Case Studiesgithub

馃搷 Posted directly to jessealton.com
Share:
JA

Jesse Alton

Founder of Virgent AI and AltonTech. Building the future of AI implementation, one project at a time.

@mrmetaverse

Related Posts

Subscribe to The Interop

Weekly insights on AI strategy and implementation.

No spam. Unsubscribe anytime.