Adventures in AI Agent Onboarding

Dec 12, 2024

We’ve seen a couple interesting launches over the last couple weeks circling around the question of how to onboard AI agents and give them access to the tools / data they need to do their jobs:

(1) Anthropic’s Model Context Protocol (MCP) - open-source standard of two-way connections between data sources and AI tools (MCP servers and clients to connect with these servers)

(2) /dev/agents - highly anticipated cloud-based OS for AI agents from former Stripe CTO David Singleton that recently announced a $56M fundraise from Index Ventures and CapitalG

(3) OpenInt out of YCombinator - focused on unified APIs for read, write, sync between databases, SaaS products deployed on Cloudflare workers with connection flow UI

This area of AI agent onboarding and permissioning is really fascinating to me right now because it feels like we are on the cusp of a new standard being built - and I’m excited to see what the next generation of authentication, integration, security, runtime tools look like that will emerge for onboarding AI agents.

Philosophically, we are also in a strange in-between time right now where you can view AI agents on a spectrum from purely machines / software (e.g. function or “copilot” view of the world) all the way to AI employees, equivalents of human workers (e.g. 11x, Artisan, Ema, Lyzr). Depending on your world view, you could think about onboarding them similar to the prior generation of software vendors or you could think about onboarding them analogous to how you would a human employee. Wanted to jot down some thoughts exploring frameworks that might be reused from the prior generation of SaaS and what new paradigms and standards might have to emerge, perhaps taking inspiration from equivalent steps in the human employee onboarding process (things are moving so quickly - would welcome thoughts on what folks are seeing and also if you are building in the space!)

Let’s walk with some dimensions of managing AI agent onboarding (deployment / run-time and authentication), starting with a more machine-centric world view that borrows from technologies + analogies from the prior SaaS generation.

"On-prem is cool again, sort of” - ISV marketplaces, what’s behind the rise of single-tenant SaaS, and what we got wrong about gross margins of genAI products

Deployment / Run-Time

As you know, software made a huge shift from on-prem (software deployed on physical servers on site or managed in your customer’s data centers) to SaaS (software delivered as a service, typically hosted in a multi-tenant approach where multiple customer’s data would be housed on the vendor’s cloud), facilitated by the rise of hyperscalers like AWS, GCP, Azure that obfuscated away the challenges around spinning up resources and scaling up / down servers to meet demand.

This was the big narrative that played out during the 2000s-early 2020s: the transition from a legacy on-prem software platform to a new generation of cloud-hosted SaaS products.

An interesting pattern we’ve seen emerging among AI agent companies that was somewhat unexpected was the rise of single-tenant SaaS (or “hosted” SaaS) models. In this model, AI agents deploy to their customer’s private clouds — e.g. their software is actually running on their customer’s AWS instance.

So in a sense, on-prem is kind of cool again. Never thought I’d say that.

The key drivers behind this trend seem to be:

(1) Ability to use their customer’s AWS budgets toward their solutions. Many of the hyperscalers are spending billions of dollars in CAPEX standing up new data centers - these are fixed investments so each of them is strongly incentivized to drive increased utilization of this compute (fixed asset they already invested in - also helps with the investor narrative around being a beneficiary of AI). Through programs like AWS’s ISV (independent software vendor) Co-Sell program, AWS’s salesteam is getting comped commission dollar for dollar on any sales of these new AI tools from independent vendors and the compute they drive goes toward the customer’s committed compute budgets with AWS. In some situations, AWS is even paying for professional services implementation costs for customers who don’t have the know-how to set up a new AI tool themselves - these heroes are just knocking down barriers left and right to drive more compute spend!

For the standpoint of an internal champion at the customer, this is great - you get to look like a hero for bringing in a cool new AI vendor and AWS is footing the implementation bill - and if it’s out of your existing AWS committed spend budget line item, it’s basically free right? Right?

ISV programs have the perfect incentive scheme for the moment of time we are in. Software spending goes through these bundling and unbundling cycles and we’re currently in a vendor consolidation cycle where companies are tightening budgets and putting more scrutiny on each line item. This is driven largely by broader economic conditions but also driven (in my personal opinion) by general fatigue and backlash from the previous unbundling cycle that led to budget line items for a million SaaS products that were mostly tiny stand-alone features instead of full platforms. ISV programs solve this by enabling the ISV marketplace owner to be the ultimate bundler and enabling AI vendors to sneak in under an existing budget line item.

(2) Customers are way more conscious of data privacy now. Companies are able to capitalize on their proprietary data in a way they have not been able to prior to the AI age and, as a result, are way more paranoid around data leakage / vendors using their data to train models that could be sold to their competitors or otherwise erode the defensibility of their businesses. They want all LLM queries with proprietary data to be run through their enterprise version of OpenAI and most of them have already set up an internal API endpoint to enable this - in fact, it was one of the first AI initiatives most of these companies implemented. Which sets the stage for just keeping your data within your own private cloud.

(3) For the AI vendor, it vastly improves gross margin. One of the fears at the start of the genAI transition that investors had was around the gross margin hit from genAI. In their minds, these genAI products would by definition be lower gross margin than the prior generation of SaaS products because the inference costs to OpenAI / Anthropic would hit the COGS line - this was the whole “wrapper around GPT” fear. I think the defensibility fears was somewhat warranted but the fear around the GM hit turned out to be largely misplaced because of the transition to usage-based models (aligning revenue more toward how costs scale) and also because of the rise of single-tenant SaaS. In this model, all of your server bill is actually footed by your customer and all of the compute also runs through their enterprise OpenAI endpoint. You assuage your customer’s fears around their proprietary data leaking and you hand off all your costs to your customer - amazing. These models are basically 95%+ GM -- higher than the prior generation of SaaS (~80-85%)! And if AWS foots the bill for implementation and pushing your product through their sales channels - what an incredible deal! It’s like they are subsidizing your CAC and your customer is subsidizing your COGS.

There are obviously trade-offs and downsides to the single-tenant model - the biggest downsides I see for AI companies are (1) that managing deployments + upgrades across customers can be a nightmare and (2) the ISV marketplace owner owns the customer relationship. However, I think the distribution uplift these AI agent companies are getting is worth it at this point in the cycle because the biggest fear right now is getting lost in a sea of 100+ other AI SDRs or AI marketers or name your AI category and never getting the traction you need to solidify your place in the market.

The first downside is very real - it is nontrivial to ensure upgrades in your software are synced to all of these instances of your software which are now hosted on your customer’s servers (e.g. what if customers don’t want to or can’t upgrade yet and you are forced to support prior versions of your software for example - could be a mess!) We’ll need CI/CD tools to help manage deployment, versioning, monitoring of apps deployed on your customer’s cloud (maybe something like Ryvn). Customers will eventually want more visibility into how their compute is being spent by each of their hosted SaaS vendors, how much of their enterprise OpenAI budget is being driven by each of them, cybersecurity / permissioning / access / auditing to ensure the right level of access for these vendors, and easy onboarding tools for provisioning access and removing vendors. An interesting downstream consequence might be if more implementations move in the direction of single-tenant SaaS, will we see usage based monitoring potentially shift away from the pricing / billing companies like Metronome / Orb and toward API / AI gateway providers managing metering (and potentially access) of hosted model endpoints (e.g. enterprise OpenAI) + fine-tuned open source models on Bedrock? Would welcome thoughts.

A home for AI agents: Contenders for building the open-source standard for AI agent onboarding & runtime

There’s been a lot of open-source multi-agent frameworks like CrewAI, BabyAGI, Microsoft Autogen but I still haven’t seen an open-source standard runtime for onboarding and deploying AI agents to customer clouds aka a home for AI agents. Perhaps this looks like a Kubernetes or Docker like container (maybe WASM?) that handles authentication, memory, caching, and offers a runtime for the AI agent to perform its task and an orchestrator model able to coordinate ensembles of AI agents, potentially internal or external. I think much of the Microservices architecture stays relevant — these AI agents can access existing APIs and microservices that a company already has (with documentation on how to use them) as well as Tools that connect them to APIs of other SaaS products (systems of record like Salesforce, Netsuite) or productivity / collaboration interfaces (Slack, Gmail, Office Suite). Perhaps this new AI agent runtime runs along a lot of the same rails we’ve already developed - for example, Kubernetes has a ton of the technical scaffolding already for specifying permissions (API keys, security tokens), resource access, orchestration of tasks - you’d have to add tools, maybe vector DB connectors for AI agents. Traditionally manual human tasks like provisioning or permissioning resources has also increasingly become codified and automated (see Infrastructure as Code - IaC - standards like Terraform, OpenTofu, Pulumi) and there’s been so much work down on serverless Lambda functions, it feels like we aren’t that far away from this.

There are a number of multi-agent orchestration frameworks moving generally in this direction. I’m pretty surprised how slow large players have been to move on building this standard - they each have bits and pieces of it, but have mostly focused on specs for tool use, data connectors, task routing. I feel like we’re still missing that definitive open-source standard that bundles permissions, memory, tools, runtime for AI agents:

CrewAI - this one started off as an open source library for defining role-based agents, tasks, tools that agents could use, and end-to-end processes that could delegate tasks to individual agents. They launched an enterprise cloud offering now where you can connect with different model endpoints and other SaaS platforms (Slack, Hubspot, Zapier) but feels more oriented toward companies interested in building internal tools for their employees.
LangChain / LangGraph - I haven’t followed LangChain too closely since their shift to Runnables and Executables because I found that the documentation began to get a bit inscrutable. The LangGraph launch seemed promising - I thought the approach of making DAGs (direct acyclic graphs) cyclic to enable agent use cases where you want them to loop tasks on an ongoing basis was quite clever. I see them as a leading contender for launching something in the direction of a runtime standard for AI agents given the strength of the community toolchain ecosystem and open source ethos of the company.
AutoGen from Microsoft - open source library for managing multi-agent lifecycles, task delegation. Architecturally, this actually has a lot of the runtime aspects I think is missing in a lot of the other platforms but it’s relatively barebones. Plays nice with LangGraph, LlamaIndex and some of the other AI agent frameworks - would love to see more work on the tool community but pretty impressed how forward thinking this is from a large tech player.
OpenAI has moved pretty slowly here - maybe the closest thing is their tool spec (closer to a JSON schema specification) or GPTs and Assistants calling. OpenAI launched an “experimental framework” Swarm 2 months ago for multi-agent orchestration that allows agent to delegate tasks to other agents while sharing the same message context.
Anthropic’s MCP which we talked about earlier - mostly focused on the data syncing tools side. I’m a bit curious about why they took the approach of the client - server architecture for MCP. I think it makes sense coming from Anthropic because it has a very model-centric view of the world where Claude is smart enough to do all the logic / workflow and all you will need is Claude + tools linking to data sources / systems of record. In the real world, I think there could be space for an application layer storing the “system of workflows” logic and this might require more infrastructure (e.g. a runtime, etc) on top of the client-server architecture. The name “Model Context Protocol” seems to reveal this underlying bias toward a very model centric world view as well. However, it seems like a promising starting point and I love that they are making moves towards supporting the open source community and pushing for a standard in the space.
AWS Multi-Agent Orchestrator - this is an open-source library that AWS Labs launched without a ton of fan fare (~2.9k stars on Github). This has components of what I envisioned and can be run on AWS Lambda but seems a bit more focused on routing and dispatching jobs to agents with less focus on authentication, tool ecosystem, fine-tuning, and runtime. There could be something really interesting here with AWS Bedrock for access to open-source models, fine-tuning for specific use cases, AWS EKS (Elastic Kubernetes Services) for distributed workers, and AWS Lambda for serverless function calling for lightweight, low context jobs. Wouldn’t rule them out.
Cloudflare - haven’t seen them launch something here but Cloudflare is actually really well positioned for this given how much of the internet runs through Cloudflare and their positioning as the security layer of the internet. They have launched a ton of really cool new products for AI bot detection + scraper blocking and analytics to distinguish bot traffic. There were a lot of questions in the early days of the internet about how we were going to manage website crawling for search engines and whether or not Google could scrape your website, which led to the emergence of robots.txt. I think Cloudflare owns the next-gen robots.txt equivalent for AI (maybe llm.txt?) that allows LLMs to surface information from your page in LLM responses. The same technology for bot control / DDoS prevent could manage traffic across different client applications. One thing I’ve been thinking a lot about is how Slack has those shared channels between organizations that both use Slack. Could other companies replicate this model - virtualized shared cloud for data sharing (Snowflake) or bridges between private clouds through API/AI gateways like Kong, VPC access bridges, or some proxy server through Cloudflare? Maybe the AI agent runtime standard ends up being built on top of Cloudflare workers (similar to OpenInt’s approach?) which with a bit of additional feature development on top would be an interesting starting point for this architecture.

Feels like there are a lot of players kind of adjacent to this AI agent operating system / runtime standard concept and you could feasibly get there from many different starting points: From a API/service mesh gateway offering like Kong or Tetrate? maybe from an IaC platform like Pulumi? From a lower level on the infrastructure stack like a hyperscaler (AWS Lambda / EKS) or PaaS infrastructure player (Cloudflare / Vercel)? Maybe a new age devops automation company built on top of OpenTofu/Terraform or next gen ServiceNow like IT ops / IAM company like Lumos converges toward something like this?

Revisiting iPaaS (Integration Platform as a Service) for AI agents: Why number of connectors might not be a big moat anymore and why janky integrations (reverse engineering internal APIs, browser / screen automation) might be here to stay

Authentication + Connectivity Standards

AI agent authentication is really messy right now to say the least. I think this stems from the fact that, from an authentication perspective, you can view AI agent auth on a spectrum from piggybacking off of the username/password of a human employee (“copilot” view of the world) to fully machine authentication (API keys / tokens). We are still in this world view of “copilots” so the AI agents are seen as acting on behalf of humans - but what happens when AI agents can do multiples of the work an individual human could and work 24/7. Will SaaS companies begin to throttle how much work AI agents can perform on behalf of humans using their seat license? Perhaps SaaS companies could come up with a new “seat” type for AI employees, priced at a different rate? Perhaps (more likely) SaaS companies will just restrict seat-based licenses to human users and force external AI agents to integrate via API instead where they can charge additional fees for access or usage, reserving preferential internal API functionality + access to their own copilot offerings.

The level of connectivity we’re talking about, from my perspective, is bi-directional read/write/sync. Luckily, we’ve been doing “connectors” for a while from the early days of Mulesoft to more modern workflow automation tools like Zapier and Workato. We’re seeing a lot of emerging players in and around this space that could move into becoming the authentication / connector standard for AI agents - as well as some older iPaaS (integration platform as a service) and adjacent players that might be well positioned to move into the space:

There are a number of authentication platforms directly focused on AI agents that look promising including Anon, Composio AgentAuth.
Something like Paragon (pre-AI agent iPaaS platform) could move in this direction
API / service mesh gateway companies like Kong or public API marketplaces could be well positioned depending on how quickly they move. I would have said RapidAPI is not a bad starting point but they got acquired recently by Nokia and had slightly sketchy unofficial APIs. Postman has actually been making a lot of moves from API testing which was mostly development focused to more production focused use cases with the launch of their public API marketplace.
Workflow automation tools like Workato, Zapier or next-gen versions of these like Gumloop and Relay actually have great breadth of coverage.

We’ve been laying the groundwork for years for open source authentication and API connectivity standards through OAuth and OpenAPI. With AI writing integration code, hopefully building and updating connector libraries becomes trivial and AI agent companies can leverage this infrastructure to move faster. However, if the cost to write code continues to go down, the breadth of connectors might erode as a moat for any of these iPaaS for AI agent players and this segment of the market might end up being owned by one of these new AI agent authentication platforms that is able to quickly build out a competing library of integrations. Makes you kind of concerned about businesses like Fivetran where their entire business is maintaining up-to-date connectors for ETL pipelines. Maybe it helps their business in the short-run because they can use AI to maintain them instead of hiring giant teams of engineers? Long-term, depending on how bullish you are on AI coding, you might conclude this will erode most of the margins for these iPaaS players and reduce value-capture at this part of the stack. And then, from there, if AI coding gets even better, perhaps you won’t need this layer at all and it will be trivial for AI agent companies to quickly implement one-off integrations for each new customer they are onboarding and system they are connecting to. I think it’s more likely that breadth of connections stays somewhat of a moat but a “softer moat” because vendors will purposefully bottleneck access to APIs to individual AI agent companies or new iPaaS players to prevent nefarious activity.

One model where you might still be able to differentiate yourself + capture value at this layer is focusing on industries / functional areas that are sufficiently complex or heavily regulated to warrant their own integration stack or where from a security and risk perspective, the vendors you are trying to connect to are purposefully restricting or tightly controlling API access. In these cases, being one of the sanctioned “aggregators” is valuable and you’ll experience less pricing pressure (e.g. Plaid for fintech, Finch for HR systems, or Nylas for email/contacts/calendar, there hasn’t really been a defining API integration company in healthcare despite efforts around the FHIR standard but maybe Ribbon, Turquoise, HealthGorilla are contenders for different parts of the stack?)

Another area where authentication / connectivity platforms for AI agents could differentiate themselves is by integrating on a deeper level (e.g. internal APIs) or into systems that don’t have public APIs (e.g. screen / browser automation). Unfortunately, these types of integration get increasingly janky and there is a risk that the platforms you are trying to integrate with eventually block your access. Despite this, I do think there is value to these types of jankier integration approaches because there are so many legacy systems without public APIs where the legacy vendor has no incentive whatsoever (and likely a huge disincentive) to provide a public API. There are multiple companies working on integration outside of sanctioned external APIs:

(1) Internal APIs - SaaS companies frequently do not surface all functionality in their public API that they allow end users to perform via their GUI. Companies like Baton AI and libraries like Integuru by Taiki focus on using AI to reverse engineer these internal APIs. If this gets more reliable, perhaps an integration platform could gain traction for their broad catalog of “secret” internal APIs. But again - some risk that SaaS platforms put licensing restrictions on their products to prevent AI agents from using them in this way or accessing these internal APIs.

(2) Screen / browser automation - for integrating with products without an API, you can always resort to screen / browser automation techniques like Anthropic computer use, Multion, Skyvern, Orby. Google just launched Project Mariner in beta yesterday which automates browser interactions and reasons over HTML DOM elements. This is the AI successor to Selenium browser automation or maybe RPA on steroids with fuzzier logic. The LLM is taking in either the underlying HTML or screenshots of the entire window / screen to determine the next step to take to accomplish a given goal (for example, deciding which button in the UI to press next). This is pretty wild to me that its cost effective enough to do this screen by screen - as the cost of multimodal models continues to decrease, this option should become more viable. Aaron Levie from Box posted a very optimistic Twitter post about Agents using the web, which I agree with because there is a lot of functionality that is not surfaced via API. It’s useful to just piggyback on the interfaces used by humans since it’s an existing interface the AI agents can use. Websites may put more controls to prevent a million AI agents from hammering their sites (ala robots.txt in the search engine era which is why I’m so long on Cloudflare). Intellectually, it’s just strange to think about how inefficiently anthropomorphic it is for the AI agents to be using GUI made for humans (underlying functionality written in code has to be transformed into GUI made for humans only to be screenshotted and the pixels interpreted through a multimodal model executed in code to get the AI agent to figure out which button to press next? It seems incredibly circuitous. Why go through all that trouble to make it human interpretable if no human is ever going to look at it? It just seems horrendously inefficient and janky but I guess if it works *shrug*.

Let’s shift to a more human centric “AI employee” world-view. I find this analogy delightful - how might we line up the current onboarding process for an AI agent to the analogous steps for hiring / onboarding a human employee?

The current process for hiring / onboarding a human employee looks something like this:

(1) Job req

(2) Applicant review

(3) Interview process

(4) Background Check / Hiring / Negotiation

(5) HR Onboarding (Payroll, Benefits, Badge)

(6) IT Onboarding (Laptop, Credentials / Permissioning, Productivity tools - Email, Slack, Teams, Zoom, other applications needed for job)

Some of the earlier steps translate pretty well to traditional software vendor procurement (job req + applicant review => RFP, interview process => proof-of-concept POC, hiring => procurement / contracting). The latter ones, HR and IT onboarding are potentially very interesting areas to explore because they translate less well.

Let’s explore these steps in detail and potential analogies for AI employee hiring and onboarding.

Job req + Applicant Review => RFP: Personally, I see a world where the RFP process is almost completely automated. Vendors would have AI trained on their company’s capabilities to answer RFP questionnaires for them; Companies hiring vendors would have RFP requirements and questionnaires generated automatically and an AI agent to review and score RFP responses from potential vendors. Companies like Responsive (formerly Rfp.io) and Loopio have been in this space for a while. Personally, I’m a bit surprised more security review companies like Vanta and Drata that do SOC2 and GDPR haven’t moved more quickly into fully automated RFP response space since security review is a pretty large part of vendor selection and procurement process so they are a third of the way there already. Vanta and Drata could also pre-vet potential vendors as already adhering to levels of security standards or helping vendors understand the idiosyncratic requirements of procuring companies in their network.

“Job Interviews” for AI Employees: Real-world proof-of-concept sandboxes for systematically benchmarking AI agent performance in your environment & Assessing “Cultural Fit” of an AI workforce: simulating teams of AI agents to see if they vibe

Interview process => proof-of-concept POC: I personally find this one a fun analogy. As the cost of software development continues to fall, there are going to be so many AI agent companies in every single category, it’ll be hard to keep track of them all. For enterprises, one of the challenges might actually end up being vendor comparison / selection - when there are 100 AI SDRs or 100 AI marketers or 100 XYZ AI agent companies to choose from, how do you figure out which one to use?

If you go on their websites, AI agent vendors might have a couple customer case studies so reference-ability will be a huge component of this selection process but every one of these companies uses slightly different metrics for quantifying their performance (e.g. call deflection rate, average handle time reduction, hallucination rate, etc) and there are any number of “squishy” criteria too (e.g. adherence to brand voice, SOP/script conformity, customer satisfaction, customer empathy) that have very subjective evaluation criteria between companies. In the same way academic benchmarks for model performance are falling short of actual real-world performance, are these metrics falling short of capturing AI agent performance on real- world business tasks? Could we build higher-level benchmarks measuring proficiency of AI agents at various business functions / tasks and should companies start thinking about building their own custom AI vendor eval criteria. A more systemic approach could be to run potential vendors through a standard proof-of-concept trial or backtesting sandbox where they can evaluate vendors on criteria they care about using real-world cases from their business. For example, generate a sandbox runtime hooked into a toy version of your Salesforce environment with salient examples from the last month (or synthetically generated scenarios) you’d like the AI agent to demonstrate proficiency in.

This is analogous to having a human candidate do a case study during an interview process. The difference is that companies take significant risk on potential human hires and it takes a while to see if a new hire actually delivers at the expected level because humans are non-deterministic. With AI agents, it feels like we should be able to get a better picture of their capabilities and aptitude prior to hiring them by running them through a series of real-world simulations. Systematic onboarding standards unlock systematic evaluation of AI agents.

I’d also be excited to see metrics around the “human dimensions” of AI agent teams composed of many external vendors from more complicated simulations. Does AI BDR Ava from 11x work better with AI marketer Kyle or Barbara? Are there teams of AI agents that “get along better” or work better/worse together than other teams? Maybe there will be alliances where certain AI agents from different vendors are designed to work well together out of the box.

Hiring => procurement / contracting: This process of contracting and contract review has also been experiencing more automation for a while. GenAI just dumped gasoline on the fire. All of the contract lifecycle management (CLM) companies (Ironclad, iCertis) are moving in this direction to identify key terms quickly, systematize contract language, speed up the redlining process.

HR and IT onboarding: This is where things start getting a bit interesting and weird. These steps roughly translates to the handoff between the sales team and the customer success teams in the traditional SaaS journey - it’s where product configuration and implementation happens.

HR onboarding steps are probably less relevant - the AI agent probably doesn’t need benefits (until we decide AI employees also deserve rights!) or a physical badge to go anywhere (until AI employees become embodied!) but maybe it does need some form of wallet to accept payment (?) or in the old software vendor view of the world, probably just an accounts payable setup with accounting.

Following the IT onboarding analogy gets you to some really weird questions - does the AI agent get a laptop (maybe a Virtual Machine it can use for tasks that require a computer if more tasks depend on janky screen/browser automation?) How does it get all the credentials / permissions it needs to do its job and where are those stored / managed? Is this closer to machine identity ala Entro or more human-like SSO / IAM privileges like Okta? Do you give it an email address so a hybrid human-AI workforce can dispatch work to it over email? Do you tie permissions to that email so it also has a Slack account and other seat based SaaS licenses? Do you pay for a separate Salesforce seat for the AI employee!? I think the answer is probably no - you would probably set up an email for it since that’s little to no cost and an easy, low denomination way to dispatch work to the agent but you would probably treat it more like a machine after that for the other services. You’d probably have a bot channel for it on Slack for example and give it API tokens for accessing various SaaS products or have it share credentials with a human employee in cases of the janky screen / browser automation we talked about earlier. Permissioning probably looks closer to present-day M2M identity offerings - API keys or something like Amazon Cognito or AWS Security Token Service setup for temporary credentials.

After you’ve modularized each business function / role into an AI agent workflow, there’s probably still a missing higher level “business strategy” layer needed to set business direction, make data-driven business decisions, and orchestrate teams of AI employees toward these common goals. Something akin to what a GM or manager does today with the help of a good business analyst (maybe Savant Labs). Ultimately, businesses of the future may have very few employees or be composed of just one person powered by an army of AI employees. Humans still set the creative vision for what they want to do and where they want to go but this is orchestrated by an AI army under them — they can override decisions recommended by the AI but the AI can run on auto-pilot to make informed decisions that take into account all information it has (market intel, data from previous experiments) to maximize the probability of building a sustainable, profitable business. In this way, AI agents could close the gap between having an idea and seeing it implemented in the real-world and unlock the closest thing to the millennial concept of “manifesting” that exists in this reality.

We’re still a ways from companies orchestrating AI armies across business functions to accomplish business objectives but I see a line of sight to this future. What is somewhat terrifying is if AI continues to reduce the cost of building software, there will be 100+ AI agents battling for market share in every category instead of just 3-5 credible vendors like we have today. For AI agents, the big question is what are the sustainable moats that make your product sticky so it isn’t just a knife fight to zero operating margins? For enterprises, maybe a big one becomes how do you sift through the noise to figure out which of the 100+ vendors actually delivers on your jobs-to-be-done?

Competition will heat up significantly and it’ll be increasingly challenging for any individual AI agent company to rise up above the noise. AI agent vendors will use AI to do outbound sales and write RFP responses are scale. In response, enterprises might need to deploy AI agents to review ever more RFP responses and select from ever more potential vendors. As AI is deployed to speed up contract review and facilitate implementation / product configuration, does it become easier to swap out vendors? Vendors could potential develop AI migration tools that facilitate the migration of data or workflows from a company’s previous system to their system, reducing switching costs even more.

Many people are looking to finetuning on your customer’s data as a moat but I think, at best, this is a “soft moat” because your customer still owns their data and artifacts, which they could provide to the next vendor to finetune on as well (e.g. customer provides emails or call transcripts one AI CS agent generated on its behalf to a new AI CS agent offering a similar product for 1/10th the price). Moats for AI agents are probably a topic for a much larger deep-dive at a later point - would welcome thoughts and perspectives on moats you’ve seen become more/less important in this new world.

Ultimately, common infrastructure for AI agent onboarding unlocks the ability to more accurately benchmark them against each other, which feels important and helpful as the number of competing AI agents in every category multiplies. Large enterprises, model providers, hyperscalers, or other platforms that manage ISV programs could have sufficient leverage and sway to drive adoption of a standard as a condition to participate. This transparency and swappability is potentially disadvantageous to the AI agent companies but conformance may be a prerequisite to participate on ISV marketplaces or in a vendor selection process from a large potential client. The streamlined implementation and increased reach could well be worth it for them if it drives discoverability and growth.

If you are thinking about and/or building in AI agent onboarding, authentication, benchmarking — would love to chat!

Near Futures with Nina

Discussion about this post