AI Agents have entered enterprises, and they’re making waves

Between 2024 and 2025, enterprises have seen a 5x increase in AI development, and 66% of data, analytics, and IT leaders have invested over $1M in genAI. Now, companies big and small are facing the same ROI challenge: They’ve invested in AI, but have no way to understand its impact. 


The potential for AI transformation is within reach, but most companies aren’t sure if they’re even close to it. 

High-performing AI begins with analytics

Just like any click-based software, your AI tools need analytics. Traditional analytics tools can tell you about system performance: uptime, response times, conversation volumes. But they can't answer the questions that really matter:

  • Is your agentic approach actually faster than traditional workflows?
  • Are users accomplishing tasks more or less with agents?
  • Are AI agents driving engagement, or are they creating frustration?
  • Are users who engage with your agents more likely to return to your software?

Yes, you need to understand if your agents are working at all. But what’s more important is understanding if these agents are actually faster than traditional workflows. 


85% of data, analytics, and IT leaders are under C-suite pressure to quantify genAI ROI, but few have figured out how to measure and validate this effectively. You must know if agents are speeding up workflows, improving task completion rates, and helping retention to begin understanding impact.


As AI tools increase in volume and complexity, IT and Product leaders need to measure and defend their future AI investments. In the early days of AI deployment, enterprises must track the KPIs we’ve listed in this guide. 

But first, the basics

AI agents mean different things to different people. Still, most are building agentic controls or conversational interfaces where users type input and receive helpful output. Here’s how we’re defining AI agents, agentic systems, and generative AI: 

AI Agents: Your digital workers

AI Agents are software entities focused on autonomous goal completion and reasoning. You can engage with AI agents via different interfaces, like:

  • Conversational interfaces, like chat windows, voice assistants, or messaging platforms, where users type or speak naturally. These work well for customer support agents, personal assistants, or any scenario where back-and-forth dialogue makes sense.
  • API-based interfaces that let agents work behind the scenes, triggering actions based on data or events without direct user interaction. For example, a sales agent might automatically update CRM records or send follow-up emails based on prospect behavior.
  • Embedded interfaces that integrate directly into existing software workflows. Imagine an AI writing assistant in your email composer, or a data analyst agent that sits inside your dashboard and responds to questions about charts and metrics.

Agents can perceive their environment, reason about it, and take actions to accomplish specific goals (often with a high degree of independence from humans).

They can also plan, make decisions, adapt based on feedback, and sometimes collaborate with other agents or systems. The key is matching the interface to how users naturally want to accomplish their goals.

Generative AI: Your creative thought partner

Generative AI refers to AI systems that can create novel content, such as text, images, audio, code, or video. Common examples of genAI are tools that generate images, music, and text (like ChatGPT, Claude, DALL·E, and other LLMs). 


These systems are trained on large datasets and use statistical or deep learning techniques (like large language models, GANs, or diffusion models) to generate realistic and meaningful new outputs, rather than just analyzing or classifying data.

GenAI virtual assistants will be embedded in 90% of conversational offerings in 2026.


Gartner, Emerging Tech Impact Radar: Generative AI, 2025

Agentic systems: Your cross-functional team

Agentic systems are advanced AI systems built from multiple intelligent agents working together to pursue objectives autonomously. They go beyond individual AI agents by combining perception, reasoning, decision-making, memory, and action at a system-wide level. 


Think of these as automated supply chains, or fleets of coordinated delivery drones. Agentic systems can coordinate complex, multi-agent workflows, learn from ongoing experience, and adapt in real-time to new challenges, often with minimal human oversight.


By 2028, at least 15% of day-to-day decisions will be made autonomously through agentic AI.


Gartner, Top Strategic Technology Trends for 2025: Agentic AI

Now that we’ve covered the different types of AI, here’s how to measure and improve them. 

What AI agent KPIs should you be tracking?

When selecting our top KPIs, we looked at indicators that help both the product teams building agents and the IT teams deploying agents. 


There are two categories of agent KPIs to keep in mind:

  1. Growth: What are users doing in-app before and after interacting with your agents? How does your agent impact downstream user behavior? These might be tied closely to increasing retention, reducing churn, or increasing monthly active users (MAUs).
  2. AI Performance: How users behave with your agents, and how your agents respond. Are they speeding up workflows? Are users changing how they work across your software? And most importantly: are they actually faster and more efficient?

AI Growth KPIs to track

KPI 1: Conversations

Conversations are the combination of back-and-forth interactions a human has with AI. Consider this a collection of prompts users send to your AI agent within a specific timeframe. 


While simple, this is the best way to understand whether users engage with your AI agents. Think of this as product or website page views: It’s an important foundational metric, but it becomes richer with context about what happens next.


Why conversation volume matters

Conversations serve as your foundational health metric. It reveals whether people engage with your agent or if you've built expensive digital tumbleweeds. 

Beyond basic usage, this metric drives three business-critical insights:

  1. Engagement trends: Rising volume typically indicates growing adoption and user satisfaction. Segment by user type (new vs. returning, free vs. paid users) and monitor prompts per user ratios. This reveals whether growth comes from new adoption or deeper engagement from existing users.
  2. Capacity planning: High conversation volume signals the need for infrastructure scaling and budget allocation.
  3. Cost forecasting: Since most AI services charge per API call or token, volume directly impacts operational expenses.

What to watch out for: Watch for sudden volume drops. These can reveal technical issues or user churn. High conversation volume paired with low engagement metrics suggests users are trying your agent once and bouncing, a sign your AI isn't solving real problems.

KPI 2: Visitors

Visitors are the number of unique users interacting with your AI agent within a specific timeframe, typically measured as daily active users (DAU) or monthly active users (MAU).


How to calculate visitors

Count unique user identifiers (logged-in users, device IDs, or session tokens) that interact with your agent. 


Track both DAU and MAU to understand usage patterns and calculate stickiness ratios.

Why visitors matter

While conversation volume shows activity, visitors reveal your actual user base size. This metric directly impacts revenue potential, market penetration, and product-market fit (PMF). Unlike web visitors who might just browse, AI visitors represent engaged users actively seeking solutions.


For deep insights, monitor new vs. returning visitor ratios. Average one-month retention rate is 39%, but this varies dramatically by industry and company size.


What to watch out for: A declining visitor count could signal user churn or acquisition problems. On the other hand, high visitor counts with low conversation volume per visitor suggest an activation issue. 


Maybe users are trying your agent, but don’t find it valuable enough for continued use. This is one (of many) ways to truly understand if your agents are truly helping or need continued refinement.

Growth KPI 3: Accounts

Accounts are the number of distinct organizational accounts or companies using your AI agent, separate from individual user counts.


How to calculate accounts

Count unique company domains, organization IDs, or billing entities with active AI agent usage within your timeframe.

Why accounts matter

Individual users come and go, but accounts represent sustainable revenue and organizational adoption. One account might have 50 users today and 200 tomorrow. Accounts also indicate whether you're achieving true enterprise penetration or just departmental experiments. 


Within accounts, look at: 

  • Users per account. How many employees are adopting your AI feature within each organization?
  • Account size (SMB, mid-market, enterprise), industry vertical, or subscription tier. Is your product stickier within small tech startups or large banking institutions? Understanding your user base is the first step to building an intelligent, data-driven roadmap.

What to watch out for: Growing user counts (but flat account numbers) mean you're getting deeper penetration but not wider market adoption. Shrinking accounts with stable users suggest organizational churn that individual metrics might miss. 

KPI 4: Retention rate

Retention rate measures the percentage of users who return to your AI agent after their initial interaction within a specific timeframe (typically one day, one week, or one month).


Here's how to calculate AI Agent retention rate:

Why retention rate matters

Retention reveals whether your AI agent creates genuine value or just satisfies curiosity. High acquisition means nothing if users disappear after one session. 


Retention is especially telling for AI products because users have specific problems to solve. If your agent doesn't deliver, they won't waste time coming back.


Strong retention rates vary by use case, industry, and company, but SaaS retention benchmarks include:

  • One-month retention: 39% (did they find immediate value?)
  • Two-month retention: 33% (is it becoming a habit?)
  • Three-month retention: 30% (true product-market fit indicator)

Track cohort retention curves to understand how different user groups behave over time. Users acquired through organic search typically show higher retention than paid acquisition traffic.


What to watch out for: Retention cliff-offs after specific days often reveal onboarding gaps or missing features. If Day 7 retention drops dramatically, users likely hit a capability wall. Poor retention among high-value user segments signals fundamental product issues that growth tactics can't fix.


AI Performance KPIs to track

KPI 5: Unsupported requests

Unsupported requests measure the percentage of user prompts your AI agent cannot handle, doesn't understand, or explicitly states it cannot complete within a given timeframe.


How to calculate unsupported requests:

Why unsupported requests matter

This metric reveals the gap between user expectations and your agent's capabilities. Unlike traditional error rates that track technical failures, unsupported requests show where your AI hits knowledge or functional boundaries. High unsupported request rates indicate users are asking for things your agent simply can't deliver. 


Conversely, if unsupported requests are suspiciously low for topics your agent shouldn't handle, your AI is probably hallucinating—making answers up instead of admitting it doesn't know. It's time to add guardrails.


This KPI directly impacts user frustration and churn. Nothing kills AI adoption faster than repeated "I can't help with that" responses. Smart teams use unsupported request data to:


  • Prioritize feature development. What capabilities would eliminate the most user friction?
  • Identify training gaps. Are users asking about topics your agent should know but doesn't?
  • Refine user onboarding. Can you better set expectations about what your agent can and cannot do?

What to watch out for: Rising unsupported request rates often signal scope creep because users discover your agent and push its boundaries.


However, this isn’t necessarily a bad thing. While consistently high rates could suggest a mismatch between user needs and agent capabilities, this can also tell you what your roadmap needs to look like and what to prioritize. 


Also, watch for patterns in unsupported requests that reveal blind spots in your AI training.


KPI 6: Rage prompting

Rage prompting identifies conversations where users express frustration. Think: negative sentiment, typing in ALL CAPS, using profanity ($!#*), or repeatedly rephrasing questions because your AI agent isn't delivering satisfactory answers.

How to measure rage prompting

Unlike traditional metrics with hard formulas, rage prompting requires analysis of conversation sentiment and patterns. 


Tools like Pendo Agent Analytics evaluate each conversation against criteria like hostile language, repeated reformulations of the same question, and escalating frustration to flag rage-prompting incidents.

Why rage prompting matters

Rage prompting is your early warning system for user frustration. When someone starts typing in ALL CAPS or says "For the third time, I need…”, you’re dealing with a case of user rage. This behavior happens when your AI misunderstands requests, provides irrelevant answers, or forces users to play twenty questions to get basic help. 


Unlike other failure metrics, rage prompting captures emotional context. Users might accept one "I don't understand" response, but when they start swearing at your bot, you've created lasting negative impressions that hurt user satisfaction, retention, and perception.


Track rage prompting patterns within agent analytics to identify:


  • Common failure scenarios: What use cases consistently make users lose their cool?
  • Agent comprehension gaps: Are users getting angrier because your AI misses obvious context?
  • Communication breakdowns: Do certain user types struggle more with how your agent "thinks"?

What to watch out for: Rising rage prompting rates signal serious usability problems. Watch for spikes after product updates, because new features might confuse users or break existing workflows.

Also, monitor if rage prompting clusters around specific user segments, suggesting your agent works well for some audiences but terribly for others.

  • Common failure scenarios: What use cases consistently make users lose their cool?
  • Agent comprehension gaps: Are users getting angrier because your AI misses obvious context?
  • Communication breakdowns: Do certain user types struggle more with how your agent "thinks"?

KPI 7: Average latency

Average latency measures the time between a user's submission of a prompt and the AI agent's beginning or completion of its response. Today, time to first token is the more common latency measurement. 


Depending on your implementation and what’s important to you, this can mean: 

  1. Time to first token: When your agent starts streaming responses.
  2. Time to full response completion: When the full response is provided.

How to calculate average latency:

For streaming responses, measure time to first token. For non-streaming responses, measure time to complete response delivery.

Why average latency matters

Users have different patience levels, depending on context. Users will often wait longer for a complex research task, but expect near-instant responses for simple lookups. The key is setting proper expectations, creating explainable AI, and delivering consistently.  


Latency directly impacts user perception of your agent's intelligence, but users are becoming more forgiving when you show progress. A "thinking..." indicator or streaming response can make 8 seconds feel faster than a silent 3-second wait.


Monitor latency by query type to spot patterns that hurt retention:


  1. Query complexity correlation: Do complex research questions warrant longer wait times, or are users bouncing because it takes too long?
  2. Peak usage periods: Does your agent slow down during high-traffic times?
  3. Question type analysis: Are users avoiding certain types of questions because they take forever to answer?

Why average latency matters

What to watch out for

Instead of worrying about latency numbers, focus on user behavior. If users ask pricing questions quickly but never return after slow competitor analysis queries, that's your smoking gun.


In addition, watch for latency variance more than averages. Consistent 4-second responses feel better than responses ranging from 1-10 seconds, even if the average is lower. And if you're hitting 4+ minutes for any query type, you've got bigger problems than measurement. (Although we’re seeing longer latency for “research” mode.)

What to watch out for

Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat. 


Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.

How to calculate interactions to resolve intent

Measuring this requires AI analysis to identify when users accomplish their goals vs. when they give up. 


Agent Analytics evaluates conversation patterns, user satisfaction signals, and task completion indicators to determine successful resolution and counts the interactions that led to it.

Why interactions to resolve intent matters

This metric reveals your agent's efficiency at understanding and solving user problems. Ideally, users should get what they need in 1-2 interactions. Higher numbers suggest your agent misunderstands requests, provides incomplete answers, or forces users to clarify obvious context.


Unlike simple conversation length, this KPI focuses on problem resolution. A 10-exchange conversation where the user accomplishes their goal is better than a 3-exchange dead end. 


Track interactions to resolve intent to spot:

  • Ambiguous prompting patterns. Are users struggling to communicate their needs effectively?
  • Agent comprehension gaps. Does your AI consistently miss nuance or context in certain request types?
  • Feature discoverability issues. Are users making multiple attempts to find obvious capabilities?

What to watch out for

Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat. 


Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.

Start proving your AI agents are an improvement

The KPIs in this guide are your roadmap to proving that your AI strategy is working. Most companies get stuck because they can't connect the dots between agent interactions and actual business outcomes. 


But with connected product and agent analytics, you can answer the questions your board and executives are asking: Are your agentic workflows actually enabling users to value your software faster, helping them complete tasks more efficiently, and encouraging them to return more often? 


Pendo Agent Analytics connects AI interactions to actual business outcomes.


It's the only solution designed to connect all your software data—AI interactions and traditional UI behavior—so you can truly prove that your agents are improving time to value, retention, and productivity.  


Pendo Agent Analytics reveals the complete user journey: what users try with your AI agent, when they succeed, when they abandon it for traditional workflows, and how both paths compare on speed, efficiency, and outcomes.


Ready to see it in action? Get a demo of Pendo Agent Analytics.