10 essential KPIs to prove the value of AI Agents

chapter 1

AI Agents have entered enterprises, and they’re making waves

Between 2024 and 2025, enterprises have seen a 5x increase in AI development, and 66% of data, analytics, and IT leaders have invested over $1M in genAI. Now, companies big and small are facing the same ROI challenge: They’ve invested in AI, but have no way to understand its impact.

The potential for AI transformation is within reach, but most companies aren’t sure if they’re even close to it.

High-performing AI begins with analytics

Just like any click-based software, your AI tools need analytics. Traditional analytics tools can tell you about system performance: uptime, response times, conversation volumes. But they can't answer the questions that really matter:

Is your agentic approach actually faster than traditional workflows?
Are users accomplishing tasks more or less with agents?
Are AI agents driving engagement, or are they creating frustration?
Are users who engage with your agents more likely to return to your software?

Yes, you need to understand if your agents are working at all. But what’s more important is understanding if these agents are actually faster than traditional workflows.

85% of data, analytics, and IT leaders are under C-suite pressure to quantify generative AI ROI, but few have figured out how to measure and validate this effectively. You must know if agents are speeding up workflows, improving task complete rates, and helping retention to begin understanding impact.

As AI tools increase in volume and complexity, IT and Product leaders need to measure and defend their future AI investments. In the early days of AI deployment, enterprises must track the KPIs we’ve listed in this guide.

chapter 2

But first, the basics

AI agents mean different things to different people. Still, most are building agentic controls or conversational interfaces where users type input and receive helpful output. Here’s how we’re defining AI agents, agentic systems, and generative AI:

AI Agents: Your digital workers

AI Agents are software entities focused on autonomous goal completion and reasoning. You can engage with AI agents via different interfaces, like:

Conversational interfaces, like chat windows, voice assistants, or messaging platforms, where users type or speak naturally. These work well for customer support agents, personal assistants, or any scenario where back-and-forth dialogue makes sense.
API-based interfaces that let agents work behind the scenes, triggering actions based on data or events without direct user interaction. For example, a sales agent might automatically update CRM records or send follow-up emails based on prospect behavior.
Embedded interfaces that integrate directly into existing software workflows. Imagine an AI writing assistant in your email composer, or a data analyst agent that sits inside your dashboard and responds to questions about charts and metrics.

Agents can perceive their environment, reason about it, and take actions to accomplish specific goals (often with a high degree of independence from humans).

They can also plan, make decisions, adapt based on feedback, and sometimes collaborate with other agents or systems. The key is matching the interface to how users naturally want to accomplish their goals.

Generative AI: Your creative thought partner

Generative AI refers to AI systems that can create novel content, such as text, images, audio, code, or video. Common examples of genAI are tools that generate images, music, and text (like ChatGPT, Claude, DALL·E, and other LLMs).

These systems are trained on large datasets and use statistical or deep learning techniques (like large language models, GANs, or diffusion models) to generate realistic and meaningful new outputs, rather than just analyzing or classifying data.

GenAI virtual assistants will be embedded in 90% of conversational offerings in 2026.

Gartner, Emerging Tech Impact Radar: Generative AI, 2025

Agentic systems: Your cross-functional team

Agentic systems are advanced AI systems built from multiple intelligent agents working together to pursue objectives autonomously. They go beyond individual AI agents by combining perception, reasoning, decision-making, memory, and action at a system-wide level.

Think of these as automated supply chains, or fleets of coordinated delivery drones. Agentic systems can coordinate complex, multi-agent workflows, learn from ongoing experience, and adapt in real-time to new challenges, often with minimal human oversight.

By 2028, at least 15% of day-to-day decisions will be made autonomously through agentic AI.

Gartner, Top Strategic Technology Trends for 2025: Agentic AI

Now that we’ve covered the different types of AI, here’s how to measure and improve them.

What AI agent KPIs should you be tracking?

When selecting our top KPIs, we looked at indicators that help both the product teams building agents and the IT teams deploying agents.

There are two categories of agent KPIs to keep in mind:

Growth: What are users doing in-app before and after interacting with your agents? How does your agent impact downstream user behavior? These might be tied closely to increasing retention, reducing churn, or increasing monthly active users (MAUs).
AI Performance: How users behave with your agents, and how your agents respond. Are they speeding up workflows? Are users changing how they work across your software? And most importantly: are they actually faster and more efficient?

chapter 3

AI Growth KPIs to track

KPI 1: Conversations

Conversations are the combination of back-and-forth interactions a human has with AI. Consider this a collection of prompts users send to your AI agent within a specific timeframe.

While simple, this is the best way to understand whether users engage with your AI agents. Think of this as product or website page views: It’s an important foundational metric, but it becomes richer with context about what happens next.

Why conversation volume matters

Conversations serve as your foundational health metric. It reveals whether people engage with your agent or if you've built expensive digital tumbleweeds.

Beyond basic usage, this metric drives three business-critical insights:

Engagement trends: Rising volume typically indicates growing adoption and user satisfaction. Segment by user type (new vs. returning, free vs. paid users) and monitor prompts per user ratios. This reveals whether growth comes from new adoption or deeper engagement from existing users.
Capacity planning: High conversation volume signals the need for infrastructure scaling and budget allocation.
Cost forecasting: Since most AI services charge per API call or token, volume directly impacts operational expenses.

What to watch out for: Watch for sudden volume drops. These can reveal technical issues or user churn. High conversation volume paired with low engagement metrics suggests users are trying your agent once and bouncing, a sign your AI isn't solving real problems.

KPI 2: Visitors

Visitors are the number of unique users interacting with your AI agent within a specific timeframe, typically measured as daily active users (DAU) or monthly active users (MAU).

How to calculate visitors

Count unique user identifiers (logged-in users, device IDs, or session tokens) that interact with your agent.

Track both DAU and MAU to understand usage patterns and calculate stickiness ratios.

Why visitors matter

While conversation volume shows activity, visitors reveal your actual user base size. This metric directly impacts revenue potential, market penetration, and product-market fit (PMF). Unlike web visitors who might just browse, AI visitors represent engaged users actively seeking solutions.

For deep insights, monitor new vs. returning visitor ratios. Average one-month retention rate is 39%, but this varies dramatically by industry and company size.

What to watch out for: A declining visitor count could signal user churn or acquisition problems. On the other hand, high visitor counts with low conversation volume per visitor suggest an activation issue.

Maybe users are trying your agent, but don’t find it valuable enough for continued use. This is one (of many) ways to truly understand if your agents are truly helping or need continued refinement.

Growth KPI 3: Accounts

Accounts are the number of distinct organizational accounts or companies using your AI agent, separate from individual user counts.

How to calculate accounts

Count unique company domains, organization IDs, or billing entities with active AI agent usage within your timeframe.

Why accounts matter

Individual users come and go, but accounts represent sustainable revenue and organizational adoption. One account might have 50 users today and 200 tomorrow. Accounts also indicate whether you're achieving true enterprise penetration or just departmental experiments.

Within accounts, look at:

Users per account. How many employees are adopting your AI feature within each organization?
Account size (SMB, mid-market, enterprise), industry vertical, or subscription tier. Is your product stickier within small tech startups or large banking institutions? Understanding your user base is the first step to building an intelligent, data-driven roadmap.

What to watch out for: Growing user counts (but flat account numbers) mean you're getting deeper penetration but not wider market adoption. Shrinking accounts with stable users suggest organizational churn that individual metrics might miss.

KPI 4: Retention rate

Retention rate measures the percentage of users who return to your AI agent after their initial interaction within a specific timeframe (typically one day, one week, or one month).

Here's how to calculate AI Agent retention rate:

Why retention rate matters

Retention reveals whether your AI agent creates genuine value or just satisfies curiosity. High acquisition means nothing if users disappear after one session.

Retention is especially telling for AI products because users have specific problems to solve. If your agent doesn't deliver, they won't waste time coming back.

Strong retention rates vary by use case, industry, and company, but SaaS retention benchmarks include:

One-month retention: 39% (did they find immediate value?)
Two-month retention: 33% (is it becoming a habit?)
Three-month retention: 30% (true product-market fit indicator)

Track cohort retention curves to understand how different user groups behave over time. Users acquired through organic search typically show higher retention than paid acquisition traffic.

What to watch out for: Retention cliff-offs after specific days often reveal onboarding gaps or missing features. If Day 7 retention drops dramatically, users likely hit a capability wall. Poor retention among high-value user segments signals fundamental product issues that growth tactics can't fix.

chapter 4

AI Performance KPIs to track

KPI 5: Unsupported requests

Unsupported requests measure the percentage of user prompts your AI agent cannot handle, doesn't understand, or explicitly states it cannot complete within a given timeframe.

How to calculate unsupported requests:

Why unsupported requests matter

This metric reveals the gap between user expectations and your agent's capabilities. Unlike traditional error rates that track technical failures, unsupported requests show where your AI hits knowledge or functional boundaries. High unsupported request rates indicate users are asking for things your agent simply can't deliver.

Conversely, if unsupported requests are suspiciously low for topics your agent shouldn't handle, your AI is probably hallucinating—making answers up instead of admitting it doesn't know. It's time to add guardrails.

This KPI directly impacts user frustration and churn. Nothing kills AI adoption faster than repeated "I can't help with that" responses. Smart teams use unsupported request data to:

Prioritize feature development. What capabilities would eliminate the most user friction?
Identify training gaps. Are users asking about topics your agent should know but doesn't?
Refine user onboarding. Can you better set expectations about what your agent can and cannot do?

What to watch out for: Rising unsupported request rates often signal scope creep because users discover your agent and push its boundaries.

However, this isn’t necessarily a bad thing. While consistently high rates could suggest a mismatch between user needs and agent capabilities, this can also tell you what your roadmap needs to look like and what to prioritize.

Also, watch for patterns in unsupported requests that reveal blind spots in your AI training.

KPI 6: Rage prompting

Rage prompting identifies conversations where users express frustration. Think: negative sentiment, typing in ALL CAPS, using profanity ($!#*), or repeatedly rephrasing questions because your AI agent isn't delivering satisfactory answers.

How to measure rage prompting

Unlike traditional metrics with hard formulas, rage prompting requires analysis of conversation sentiment and patterns.

Tools like Pendo Agent Analytics evaluate each conversation against criteria like hostile language, repeated reformulations of the same question, and escalating frustration to flag rage-prompting incidents.

Why rage prompting matters

Rage prompting is your early warning system for user frustration. When someone starts typing in ALL CAPS or says "For the third time, I need…”, you’re dealing with a case of user rage. This behavior happens when your AI misunderstands requests, provides irrelevant answers, or forces users to play twenty questions to get basic help.

Unlike other failure metrics, rage prompting captures emotional context. Users might accept one "I don't understand" response, but when they start swearing at your bot, you've created lasting negative impressions that hurt user satisfaction, retention, and perception.

Track rage prompting patterns within agent analytics to identify:

Common failure scenarios: What use cases consistently make users lose their cool?
Agent comprehension gaps: Are users getting angrier because your AI misses obvious context?
Communication breakdowns: Do certain user types struggle more with how your agent "thinks"?

What to watch out for: Rising rage prompting rates signal serious usability problems. Watch for spikes after product updates, because new features might confuse users or break existing workflows.

Also, monitor if rage prompting clusters around specific user segments, suggesting your agent works well for some audiences but terribly for others.

Common failure scenarios: What use cases consistently make users lose their cool?
Agent comprehension gaps: Are users getting angrier because your AI misses obvious context?
Communication breakdowns: Do certain user types struggle more with how your agent "thinks"?

KPI 7: Conversion rate

Conversion rate measures the percentage of users who successfully complete all key actions guided by the AI agent within a given time period. This KPI helps you answer the question, “How effective is my AI agent at driving successful outcomes?”.

Define “completion” based on your use case, and compare these to traditional processes that don’t use AI.

How to calculate conversion rate:

Why conversion rate matters

Conversion rate measures how effectively your agent helps users complete key actions, proving whether it truly solves problems (or simply adds another interface).

High conversion rates indicate that your agent understands user intent, is designed to address relevant use cases, and guides users to successful outcomes.
Low conversion rates indicate your agent may be missing user intent or failing to drive the actions that matter to end-users.

To understand this metric, compare the AI agent conversion rates with those of traditional processes.

Users should be completing more tasks effectively through your agent. If your traditional UI’s conversion rate is higher than your agentic conversion rate, your AI feature may need additional rework.

What to watch out for

A high conversion rate isn’t always the goal. Context matters. For example, if your AI agent’s purpose is deflection—helping users find answers without submitting a support ticket—a lower conversion rate for ticket creation is a positive outcome.

KPI 8: Average time to complete

Average time to complete measures how long it takes users to accomplish specific tasks using your AI agent.

This answers the question, “Are my AI agents speeding up processes?”.

How to measure average time to complete

Track from the first user prompt to the final successful action. For a customer support agent, this might span from "How do I reset my password?" to actually resetting it. For a data analysis agent, measure from initial query to generating the requested report.

Why average time to complete matters

This metric helps you understand if your agent actually accelerates user productivity, or provides a different (potentially slower) way to accomplish the same tasks.

Speed and efficiency are core differentiators for AI agents. But unless you can see the before-and-after of your agents, you won’t know if they’re actually reducing complexity and accelerating work.

To understand this, compare average complete times between AI agents and traditional click-based processes. If users can reset their password in 45 seconds through your settings menu but take 2 minutes through your AI agent, your agent may not be worth it.

Look at time to complete via two different scenarios:

Task complexity: Simple tasks should be much faster with AI. However, multi-step workflows might see less dramatic improvements.
User experience: Are new users faster with AI because it's more intuitive than learning your UI? Or do power users stick with traditional workflows because they're more efficient?

What to watch out for: This metric reflects the typical total time for visitors to complete the funnel, accounting for all recorded completions—including potential outliers.

If users take significantly longer to finish tasks when using the agent, it may indicate added friction, unclear instructions, or inefficient prompt handling.

Also, look at average time to complete by use case. If some users complete tasks instantly, while others take 10x longer, this could be because your agent isn’t built to handle certain requests and prompts.

KPI 9: Median time to complete

Median time to complete tells you how long it typically takes most users to complete a task, filtering out extreme outliers.

It helps you answer, “How long does the average user take to perform a workflow?”.

How to calculate median time to complete

Sort all task complete times from fastest to slowest, and pinpoint the middle value.

This is your median time to complete. For even-numbered datasets, take an average of the two middle values.

Why median time to complete matters

Average time to complete isn’t always the best way to determine if your agents are speeding up workflows. One user may’ve taken 47 minutes because they jumped to a meeting, skewing your dataset and making your agent look slower than it actually performs for most users.

Median time to complete helps you understand the real user experience of your agents. If average time to complete is 3 minutes but your median time is 45 seconds, most users are flying through tasks—but some outliers are dragging down your average.

When evaluating your agent’s impact on speed and productivity, look at median and average time to complete together:

Average > Median: You have outliers taking excessively long, signaling edge cases your agent handles poorly.
Average = Median: This indicates consistent, predictable performance across users.
Average < Median: Your agent has an unusual distribution, potentially revealing a small group of power users completing tasks exceptionally fast.

What to watch out for: A widening gap between the average and median time to complete may indicate that your agent works well for some use cases, while it fails to handle others.

This gives end-users an unpredictable experience, and you need to analyze time to complete by use case, prompt type, and user group.

Also, track median time to complete by user segment. If your median complete time for returning users is 30 seconds—but 4 minutes for new users—you've either got a learning curve problem, or your agent requires too much domain knowledge to actually be effective.

KPI 10: Issue detection

Issue detection automatically surfaces and contextualizes common problems detected in agent conversations.

By tracking issues, you can improve your agents faster and drive higher user satisfaction.

How to calculate issue detection

Measuring issue detection requires automated systems that flag issues based on:

Conversation patterns, like repeated reformulations, loops, dead ends.
Sentiment analysis via rage clicks, repeat prompting, and negative language.
Behavioral signals like workflow abandonment and switching back to your traditional UI.
Performance issues, like higher response times, error rates, and unsupported requests.

Pendo Agent Analytics uses an LLM to identify issues, and then sums the number of occurrences for each issue to deliver this metric.

Why issue detection matters

Your agents are an expensive investment. But most underperforming agents don’t generate support tickets, users simply stop using them. Issue detection helps you catch performance problems while they are still fixable, turning “What broke?” into “What can we improve?”

Use issue detection to:

Prioritize your agent roadmap. What problems affect users the most? Where’s the biggest productivity gap between agents and traditional workflows?
Identify training gaps. What questions is your agent consistently mishandling?
Analyze issues. Do specific users, use cases, or workflows struggle more than others?

What to watch out for: Watch for issue detection around specific use cases. If most detected issues involve the same request or keywords, your agent’s ability to handle those use cases needs immediate attention.

Also, track issue detection trends over time. Rising detection rates after big updates signal regressions.

chapter 5

Start proving if your AI agents work

The ten KPIs in this guide are your roadmap to proving that your AI strategy is working. Most companies get stuck because they can't connect the dots between agent interactions and actual business outcomes.

But with connected product and agent analytics, you can answer the questions your board and executives are asking: Are your agentic workflows actually enabling users to value your software faster, helping them complete tasks more efficiently, and encouraging them to return more often?

Pendo Agent Analytics connects AI interactions to actual business outcomes

It's the only solution designed to connect all your software data—AI interactions and traditional UI behavior—so you can truly prove that your agents are improving time to value, retention, and productivity.

Pendo Agent Analytics reveals the complete user journey: what users try with your AI agent, when they succeed, when they abandon it for traditional workflows, and how both paths compare on speed, efficiency, and outcomes.

Ready to see it in action? Take a self-guided tour, or get a demo of Pendo Agent Analytics.