Berlin

November 4 & 5, 2024

New York

September 4 & 5, 2024

Why everyone’s suddenly talking about AI agents

What exactly are AI agents and how are they different from AI assistants?
November 21, 2024

Estimated reading time: 6 minutes

When it comes to the fast-moving AI tools space, it can be difficult to discern between actual advancements and buzzword laden marketing spiel.

The latest term that’s come into vogue is “AI agent”, with companies like Salesforce, Hubspot, Intercom, Microsoft, and OpenAI themselves, all currently pushing their supposed automation benefits. 

Is this another memo from the marketing department looking to jump on the AI bandwagon, or a genuine advancement? Let’s dig in.

AI assistants vs AI agents: what’s the difference?

By now, you’re probably very familiar with AI assistants. While OpenAI’s ChatGPT is the best known, similar tools have cropped up everywhere, within search engines and word processors, to IDEs and operating systems

These assistants are typically underpinned by the latest large language models (LLMs) to provide some kind of chatbot or other interface that can respond to natural language questions, requests, and prompts. They can be handy for drafting quick replies, summarizing long threads, and otherwise helping you deal with text quickly.

AI “agents”, on the other hand, are a step up from this, theoretically able to act on their own initiative, or at least with minimal human intervention. According to Julia Winn, an AI product manager at Intuit, “Something doesn’t become an agent until it has some degree of autonomy.” 

In an article for Towards Data Science, she identifies six building blocks of agentic behavior:

  • Perception: How well the AI system can interpret relevant data. 
  • Interactivity: The degree to which the AI system can engage with users, other AI systems, and external services. 
  • Persistence: The ability of the AI system to maintain a memory of the user and their previous interactions.
  • Reactivity: How well the AI system can respond to changes in its environment or incoming data.
  • Proactivity: Whether the AI system can anticipate the needs of the user or offer up suggestions based on its data without being explicitly prompted to.
  • Autonomy: The degree to which the AI system can “operate independently and make decisions within defined parameters.”

As Winn sees it, the best AI assistants are strong on perception, interactivity and persistence, may have some degree of reactivity, and have limited or no proactivity and autonomy. 

For example, ChatGPT can respond to a wide variety of prompts including text documents, images, and audio (high perception), has an easy-to-use chatbot interface and API (high interactivity), and can be configured to remember user details and preferences (high persistence). However, it has limited reactivity and almost no proactivity or autonomy.

So, if ChatGPT is a fairly archetypal AI assistant, are there any archetypal AI agents out there?

Why AI agents are poorly defined

The loose definition of what constitutes an AI agent has given marketing departments a lot of wiggle room. “Honestly, I think anything that could be marketed as an agent right now, will be for most companies,” says Winn. “Nothing would surprise me at this point.”

This is in stark contrast to more mature fields where the agent/not-an-agent delineations are much clearer. Winn gives the example of self-driving cars, where there are six widely agreed upon levels of autonomy, ranging from Level 0 (No Automation) to Level 5 (Full Automation). At the moment, robotaxis operate at Level 4 (High Automation) while cars with advanced driving aids operate at Level 3 (Conditional Automation). While there are obviously some gray areas and room for debate, there is broad consensus on what constitutes each level. 

Since it seems unlikely that the software giants are going to get together and agree on a definition of “agent”, we can at least consider what real agentic behavior looks like so you can determine for yourselves.

What can an AI agent actually do?

What separates an AI agent from an AI assistant is its ability to operate independently without human intervention, but exactly how much autonomy a given AI needs to be considered a true agent is up in the air.

In her article, Winn breaks autonomy down into three levels:

  • Resource control is the value and importance of resources that the AI can deploy without human intervention. Is the AI agent able to make payments, schedule meetings, assign staff, and otherwise deploy limited resources on its own? 
  • Impact scope is how consequential the AI’s decisions are on the system or organization as a whole. Can the AI make large purchases, schedule meetings that require international travel, commit staff to long projects, and generally make meaningful decisions that can’t be walked back or undone with no cost?
  • The operational boundaries are the range in which the AI can operate. Can the AI make investment decisions, book venues for meetings before inviting people, hire new staff when needed, and operate across a wide domain or just in limited areas?

A low-level agent can control some amount of resources and deploy them in meaningful ways within narrow boundaries and tight guardrails. Winn uses the example of a smart irrigation system that can decide when to water your garden based on soil moisture, weather forecasts, and other data sources. 

A mid-level agent can control a meaningful amount of resources and make significant decisions with a potential impact on the whole system within its defined boundaries. Something like an AI scheduling agent that assigns sales representatives based on its evaluation of leads, travel time, and deal size probably fits the bill. 

An advanced agent can control major resources and make significant decisions across broad operational boundaries. Winn gives the example of an AI system that optimizes a tech company’s AI pipeline including evaluating and selecting what models to deploy on $100 million worth of GPUs.

Within these definitions, it’s possible to see how some of the tools currently marketed as AI agents could be deployed in truly agentic ways. Both Salesforce and HubSpot’s agent platforms can be configured to email your customers, clients, or leads on their own. Depending on what you allow them to say in these emails and the degree to which you are prepared to follow through on any commitments they make, there are ways they could be employed as low or even mid-level agents.

How to manage the risks

Still, there are clearly issues with handing off even seemingly inconsequential tasks to AI tools without human oversight. Even relatively simple retrieval-augmented generation– (RAG) based support bots can go rogue. 

While Air Canada’s chatbot didn’t attempt to spin up Skynet, it did get the company sued when it gave the wrong advice to a passenger who was then left out of pocket. A court in Canada ordered the airline to honor the chatbot’s promises. 

The lesson? Any companies that are going to experiment with giving AI tools a level of autonomy will need to put a rigorous set of tests and safeguards in place. You should make sure that any AI feature that makes it into production behaves exactly how you expect it to in as many situations as possible.

This may mean getting some engineers to “red team” and try their best to get the AI to go rogue. You should also continue to monitor its performance when it’s available to users. For an agentic tool hooked up to an email API, for example, it would be a good idea to monitor the rate at which it is sending emails. If that spikes, there is a chance something unexpected is happening.

In broader terms, there also needs to be some oversight from engineering leaders to ensure that any features and tools align with the company’s long term goals. While it’s feasible to create AI tools with some measure of autonomy, you can’t yet hold them responsible if things go wrong.