chapter 1
Between 2024 and 2025, enterprises have seen a 5x increase in AI development, and 66% of data, analytics, and IT leaders have invested over $1M in genAI. Now, companies big and small are facing the same ROI challenge: They’ve invested in AI, but have no way to understand its impact.
The potential for AI transformation is within reach, but most companies aren’t sure if they’re even close to it.
Just like any click-based software, your AI tools need analytics. Traditional analytics tools can tell you about system performance: uptime, response times, conversation volumes. But they can't answer the questions that really matter:
Yes, you need to understand if your agents are working at all. But what’s more important is understanding if these agents are actually faster than traditional workflows.
85% of data, analytics, and IT leaders are under C-suite pressure to quantify genAI ROI, but few have figured out how to measure and validate this effectively. You must know if agents are speeding up workflows, improving task completion rates, and helping retention to begin understanding impact.
As AI tools increase in volume and complexity, IT and Product leaders need to measure and defend their future AI investments. In the early days of AI deployment, enterprises must track the KPIs we’ve listed in this guide.
chapter 2
AI agents mean different things to different people. Still, most are building agentic controls or conversational interfaces where users type input and receive helpful output. Here’s how we’re defining AI agents, agentic systems, and generative AI:
AI Agents are software entities focused on autonomous goal completion and reasoning. You can engage with AI agents via different interfaces, like:
Agents can perceive their environment, reason about it, and take actions to accomplish specific goals (often with a high degree of independence from humans).
They can also plan, make decisions, adapt based on feedback, and sometimes collaborate with other agents or systems. The key is matching the interface to how users naturally want to accomplish their goals.
Generative AI refers to AI systems that can create novel content, such as text, images, audio, code, or video. Common examples of genAI are tools that generate images, music, and text (like ChatGPT, Claude, DALL·E, and other LLMs).
These systems are trained on large datasets and use statistical or deep learning techniques (like large language models, GANs, or diffusion models) to generate realistic and meaningful new outputs, rather than just analyzing or classifying data.
GenAI virtual assistants will be embedded in 90% of conversational offerings in 2026.
Gartner, Emerging Tech Impact Radar: Generative AI, 2025
Agentic systems are advanced AI systems built from multiple intelligent agents working together to pursue objectives autonomously. They go beyond individual AI agents by combining perception, reasoning, decision-making, memory, and action at a system-wide level.
Think of these as automated supply chains, or fleets of coordinated delivery drones. Agentic systems can coordinate complex, multi-agent workflows, learn from ongoing experience, and adapt in real-time to new challenges, often with minimal human oversight.
By 2028, at least 15% of day-to-day decisions will be made autonomously through agentic AI.
Gartner, Top Strategic Technology Trends for 2025: Agentic AI
Now that we’ve covered the different types of AI, here’s how to measure and improve them.
When selecting our top KPIs, we looked at indicators that help both the product teams building agents and the IT teams deploying agents.
There are two categories of agent KPIs to keep in mind:
chapter 3
Conversations are the combination of back-and-forth interactions a human has with AI. Consider this a collection of prompts users send to your AI agent within a specific timeframe.
While simple, this is the best way to understand whether users engage with your AI agents. Think of this as product or website page views: It’s an important foundational metric, but it becomes richer with context about what happens next.
Conversations serve as your foundational health metric. It reveals whether people engage with your agent or if you've built expensive digital tumbleweeds.
Beyond basic usage, this metric drives three business-critical insights:
What to watch out for: Watch for sudden volume drops. These can reveal technical issues or user churn. High conversation volume paired with low engagement metrics suggests users are trying your agent once and bouncing, a sign your AI isn't solving real problems.
Visitors are the number of unique users interacting with your AI agent within a specific timeframe, typically measured as daily active users (DAU) or monthly active users (MAU).
Count unique user identifiers (logged-in users, device IDs, or session tokens) that interact with your agent.
Track both DAU and MAU to understand usage patterns and calculate stickiness ratios.
While conversation volume shows activity, visitors reveal your actual user base size. This metric directly impacts revenue potential, market penetration, and product-market fit (PMF). Unlike web visitors who might just browse, AI visitors represent engaged users actively seeking solutions.
For deep insights, monitor new vs. returning visitor ratios. Average one-month retention rate is 39%, but this varies dramatically by industry and company size.
What to watch out for: A declining visitor count could signal user churn or acquisition problems. On the other hand, high visitor counts with low conversation volume per visitor suggest an activation issue.
Maybe users are trying your agent, but don’t find it valuable enough for continued use. This is one (of many) ways to truly understand if your agents are truly helping or need continued refinement.
Accounts are the number of distinct organizational accounts or companies using your AI agent, separate from individual user counts.
How to calculate accounts
Count unique company domains, organization IDs, or billing entities with active AI agent usage within your timeframe.
Individual users come and go, but accounts represent sustainable revenue and organizational adoption. One account might have 50 users today and 200 tomorrow. Accounts also indicate whether you're achieving true enterprise penetration or just departmental experiments.
Within accounts, look at:
What to watch out for: Growing user counts (but flat account numbers) mean you're getting deeper penetration but not wider market adoption. Shrinking accounts with stable users suggest organizational churn that individual metrics might miss.
Retention rate measures the percentage of users who return to your AI agent after their initial interaction within a specific timeframe (typically one day, one week, or one month).
Here's how to calculate AI Agent retention rate:
Retention reveals whether your AI agent creates genuine value or just satisfies curiosity. High acquisition means nothing if users disappear after one session.
Retention is especially telling for AI products because users have specific problems to solve. If your agent doesn't deliver, they won't waste time coming back.
Strong retention rates vary by use case, industry, and company, but SaaS retention benchmarks include:
Track cohort retention curves to understand how different user groups behave over time. Users acquired through organic search typically show higher retention than paid acquisition traffic.
What to watch out for: Retention cliff-offs after specific days often reveal onboarding gaps or missing features. If Day 7 retention drops dramatically, users likely hit a capability wall. Poor retention among high-value user segments signals fundamental product issues that growth tactics can't fix.
chapter 4
Unsupported requests measure the percentage of user prompts your AI agent cannot handle, doesn't understand, or explicitly states it cannot complete within a given timeframe.
How to calculate unsupported requests:
This metric reveals the gap between user expectations and your agent's capabilities. Unlike traditional error rates that track technical failures, unsupported requests show where your AI hits knowledge or functional boundaries. High unsupported request rates indicate users are asking for things your agent simply can't deliver.
Conversely, if unsupported requests are suspiciously low for topics your agent shouldn't handle, your AI is probably hallucinating—making answers up instead of admitting it doesn't know. It's time to add guardrails.
This KPI directly impacts user frustration and churn. Nothing kills AI adoption faster than repeated "I can't help with that" responses. Smart teams use unsupported request data to:
What to watch out for: Rising unsupported request rates often signal scope creep because users discover your agent and push its boundaries.
However, this isn’t necessarily a bad thing. While consistently high rates could suggest a mismatch between user needs and agent capabilities, this can also tell you what your roadmap needs to look like and what to prioritize.
Also, watch for patterns in unsupported requests that reveal blind spots in your AI training.
Rage prompting identifies conversations where users express frustration. Think: negative sentiment, typing in ALL CAPS, using profanity ($!#*), or repeatedly rephrasing questions because your AI agent isn't delivering satisfactory answers.
Unlike traditional metrics with hard formulas, rage prompting requires analysis of conversation sentiment and patterns.
Tools like Pendo Agent Analytics evaluate each conversation against criteria like hostile language, repeated reformulations of the same question, and escalating frustration to flag rage-prompting incidents.
Rage prompting is your early warning system for user frustration. When someone starts typing in ALL CAPS or says "For the third time, I need…”, you’re dealing with a case of user rage. This behavior happens when your AI misunderstands requests, provides irrelevant answers, or forces users to play twenty questions to get basic help.
Unlike other failure metrics, rage prompting captures emotional context. Users might accept one "I don't understand" response, but when they start swearing at your bot, you've created lasting negative impressions that hurt user satisfaction, retention, and perception.
Track rage prompting patterns within agent analytics to identify:
What to watch out for: Rising rage prompting rates signal serious usability problems. Watch for spikes after product updates, because new features might confuse users or break existing workflows.
Also, monitor if rage prompting clusters around specific user segments, suggesting your agent works well for some audiences but terribly for others.
Average latency measures the time between a user's submission of a prompt and the AI agent's beginning or completion of its response. Today, time to first token is the more common latency measurement.
Depending on your implementation and what’s important to you, this can mean:
How to calculate average latency:
For streaming responses, measure time to first token. For non-streaming responses, measure time to complete response delivery.
Users have different patience levels, depending on context. Users will often wait longer for a complex research task, but expect near-instant responses for simple lookups. The key is setting proper expectations, creating explainable AI, and delivering consistently.
Latency directly impacts user perception of your agent's intelligence, but users are becoming more forgiving when you show progress. A "thinking..." indicator or streaming response can make 8 seconds feel faster than a silent 3-second wait.
Monitor latency by query type to spot patterns that hurt retention:
What to watch out for
Instead of worrying about latency numbers, focus on user behavior. If users ask pricing questions quickly but never return after slow competitor analysis queries, that's your smoking gun.
In addition, watch for latency variance more than averages. Consistent 4-second responses feel better than responses ranging from 1-10 seconds, even if the average is lower. And if you're hitting 4+ minutes for any query type, you've got bigger problems than measurement. (Although we’re seeing longer latency for “research” mode.)
What to watch out for
Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat.
Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.
Measuring this requires AI analysis to identify when users accomplish their goals vs. when they give up.
Agent Analytics evaluates conversation patterns, user satisfaction signals, and task completion indicators to determine successful resolution and counts the interactions that led to it.
This metric reveals your agent's efficiency at understanding and solving user problems. Ideally, users should get what they need in 1-2 interactions. Higher numbers suggest your agent misunderstands requests, provides incomplete answers, or forces users to clarify obvious context.
Unlike simple conversation length, this KPI focuses on problem resolution. A 10-exchange conversation where the user accomplishes their goal is better than a 3-exchange dead end.
Track interactions to resolve intent to spot:
What to watch out for
Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat.
Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.
chapter 5
The KPIs in this guide are your roadmap to proving that your AI strategy is working. Most companies get stuck because they can't connect the dots between agent interactions and actual business outcomes.
But with connected product and agent analytics, you can answer the questions your board and executives are asking: Are your agentic workflows actually enabling users to value your software faster, helping them complete tasks more efficiently, and encouraging them to return more often?
It's the only solution designed to connect all your software data—AI interactions and traditional UI behavior—so you can truly prove that your agents are improving time to value, retention, and productivity.
Pendo Agent Analytics reveals the complete user journey: what users try with your AI agent, when they succeed, when they abandon it for traditional workflows, and how both paths compare on speed, efficiency, and outcomes.
Ready to see it in action? Get a demo of Pendo Agent Analytics.
Between 2024 and 2025, enterprises have seen a 5x increase in AI development, and 66% of data, analytics, and IT leaders have invested over $1M in genAI. Now, companies big and small are facing the same ROI challenge: They’ve invested in AI, but have no way to understand its impact.
The potential for AI transformation is within reach, but most companies aren’t sure if they’re even close to it.
Just like any click-based software, your AI tools need analytics. Traditional analytics tools can tell you about system performance: uptime, response times, conversation volumes. But they can't answer the questions that really matter:
Yes, you need to understand if your agents are working at all. But what’s more important is understanding if these agents are actually faster than traditional workflows.
85% of data, analytics, and IT leaders are under C-suite pressure to quantify genAI ROI, but few have figured out how to measure and validate this effectively. You must know if agents are speeding up workflows, improving task completion rates, and helping retention to begin understanding impact.
As AI tools increase in volume and complexity, IT and Product leaders need to measure and defend their future AI investments. In the early days of AI deployment, enterprises must track the KPIs we’ve listed in this guide.
AI agents mean different things to different people. Still, most are building agentic controls or conversational interfaces where users type input and receive helpful output. Here’s how we’re defining AI agents, agentic systems, and generative AI:
AI Agents are software entities focused on autonomous goal completion and reasoning. You can engage with AI agents via different interfaces, like:
Agents can perceive their environment, reason about it, and take actions to accomplish specific goals (often with a high degree of independence from humans).
They can also plan, make decisions, adapt based on feedback, and sometimes collaborate with other agents or systems. The key is matching the interface to how users naturally want to accomplish their goals.
Generative AI refers to AI systems that can create novel content, such as text, images, audio, code, or video. Common examples of genAI are tools that generate images, music, and text (like ChatGPT, Claude, DALL·E, and other LLMs).
These systems are trained on large datasets and use statistical or deep learning techniques (like large language models, GANs, or diffusion models) to generate realistic and meaningful new outputs, rather than just analyzing or classifying data.
GenAI virtual assistants will be embedded in 90% of conversational offerings in 2026.
Gartner, Emerging Tech Impact Radar: Generative AI, 2025
Agentic systems are advanced AI systems built from multiple intelligent agents working together to pursue objectives autonomously. They go beyond individual AI agents by combining perception, reasoning, decision-making, memory, and action at a system-wide level.
Think of these as automated supply chains, or fleets of coordinated delivery drones. Agentic systems can coordinate complex, multi-agent workflows, learn from ongoing experience, and adapt in real-time to new challenges, often with minimal human oversight.
By 2028, at least 15% of day-to-day decisions will be made autonomously through agentic AI.
Gartner, Top Strategic Technology Trends for 2025: Agentic AI
Now that we’ve covered the different types of AI, here’s how to measure and improve them.
When selecting our top KPIs, we looked at indicators that help both the product teams building agents and the IT teams deploying agents.
There are two categories of agent KPIs to keep in mind:
Conversations are the combination of back-and-forth interactions a human has with AI. Consider this a collection of prompts users send to your AI agent within a specific timeframe.
While simple, this is the best way to understand whether users engage with your AI agents. Think of this as product or website page views: It’s an important foundational metric, but it becomes richer with context about what happens next.
Conversations serve as your foundational health metric. It reveals whether people engage with your agent or if you've built expensive digital tumbleweeds.
Beyond basic usage, this metric drives three business-critical insights:
What to watch out for: Watch for sudden volume drops. These can reveal technical issues or user churn. High conversation volume paired with low engagement metrics suggests users are trying your agent once and bouncing, a sign your AI isn't solving real problems.
Visitors are the number of unique users interacting with your AI agent within a specific timeframe, typically measured as daily active users (DAU) or monthly active users (MAU).
Count unique user identifiers (logged-in users, device IDs, or session tokens) that interact with your agent.
Track both DAU and MAU to understand usage patterns and calculate stickiness ratios.
While conversation volume shows activity, visitors reveal your actual user base size. This metric directly impacts revenue potential, market penetration, and product-market fit (PMF). Unlike web visitors who might just browse, AI visitors represent engaged users actively seeking solutions.
For deep insights, monitor new vs. returning visitor ratios. Average one-month retention rate is 39%, but this varies dramatically by industry and company size.
What to watch out for: A declining visitor count could signal user churn or acquisition problems. On the other hand, high visitor counts with low conversation volume per visitor suggest an activation issue.
Maybe users are trying your agent, but don’t find it valuable enough for continued use. This is one (of many) ways to truly understand if your agents are truly helping or need continued refinement.
Accounts are the number of distinct organizational accounts or companies using your AI agent, separate from individual user counts.
How to calculate accounts
Count unique company domains, organization IDs, or billing entities with active AI agent usage within your timeframe.
Individual users come and go, but accounts represent sustainable revenue and organizational adoption. One account might have 50 users today and 200 tomorrow. Accounts also indicate whether you're achieving true enterprise penetration or just departmental experiments.
Within accounts, look at:
What to watch out for: Growing user counts (but flat account numbers) mean you're getting deeper penetration but not wider market adoption. Shrinking accounts with stable users suggest organizational churn that individual metrics might miss.
Retention rate measures the percentage of users who return to your AI agent after their initial interaction within a specific timeframe (typically one day, one week, or one month).
Here's how to calculate AI Agent retention rate:
Retention reveals whether your AI agent creates genuine value or just satisfies curiosity. High acquisition means nothing if users disappear after one session.
Retention is especially telling for AI products because users have specific problems to solve. If your agent doesn't deliver, they won't waste time coming back.
Strong retention rates vary by use case, industry, and company, but SaaS retention benchmarks include:
Track cohort retention curves to understand how different user groups behave over time. Users acquired through organic search typically show higher retention than paid acquisition traffic.
What to watch out for: Retention cliff-offs after specific days often reveal onboarding gaps or missing features. If Day 7 retention drops dramatically, users likely hit a capability wall. Poor retention among high-value user segments signals fundamental product issues that growth tactics can't fix.
Unsupported requests measure the percentage of user prompts your AI agent cannot handle, doesn't understand, or explicitly states it cannot complete within a given timeframe.
How to calculate unsupported requests:
This metric reveals the gap between user expectations and your agent's capabilities. Unlike traditional error rates that track technical failures, unsupported requests show where your AI hits knowledge or functional boundaries. High unsupported request rates indicate users are asking for things your agent simply can't deliver.
Conversely, if unsupported requests are suspiciously low for topics your agent shouldn't handle, your AI is probably hallucinating—making answers up instead of admitting it doesn't know. It's time to add guardrails.
This KPI directly impacts user frustration and churn. Nothing kills AI adoption faster than repeated "I can't help with that" responses. Smart teams use unsupported request data to:
What to watch out for: Rising unsupported request rates often signal scope creep because users discover your agent and push its boundaries.
However, this isn’t necessarily a bad thing. While consistently high rates could suggest a mismatch between user needs and agent capabilities, this can also tell you what your roadmap needs to look like and what to prioritize.
Also, watch for patterns in unsupported requests that reveal blind spots in your AI training.
Rage prompting identifies conversations where users express frustration. Think: negative sentiment, typing in ALL CAPS, using profanity ($!#*), or repeatedly rephrasing questions because your AI agent isn't delivering satisfactory answers.
Unlike traditional metrics with hard formulas, rage prompting requires analysis of conversation sentiment and patterns.
Tools like Pendo Agent Analytics evaluate each conversation against criteria like hostile language, repeated reformulations of the same question, and escalating frustration to flag rage-prompting incidents.
Rage prompting is your early warning system for user frustration. When someone starts typing in ALL CAPS or says "For the third time, I need…”, you’re dealing with a case of user rage. This behavior happens when your AI misunderstands requests, provides irrelevant answers, or forces users to play twenty questions to get basic help.
Unlike other failure metrics, rage prompting captures emotional context. Users might accept one "I don't understand" response, but when they start swearing at your bot, you've created lasting negative impressions that hurt user satisfaction, retention, and perception.
Track rage prompting patterns within agent analytics to identify:
What to watch out for: Rising rage prompting rates signal serious usability problems. Watch for spikes after product updates, because new features might confuse users or break existing workflows.
Also, monitor if rage prompting clusters around specific user segments, suggesting your agent works well for some audiences but terribly for others.
Average latency measures the time between a user's submission of a prompt and the AI agent's beginning or completion of its response. Today, time to first token is the more common latency measurement.
Depending on your implementation and what’s important to you, this can mean:
How to calculate average latency:
For streaming responses, measure time to first token. For non-streaming responses, measure time to complete response delivery.
Users have different patience levels, depending on context. Users will often wait longer for a complex research task, but expect near-instant responses for simple lookups. The key is setting proper expectations, creating explainable AI, and delivering consistently.
Latency directly impacts user perception of your agent's intelligence, but users are becoming more forgiving when you show progress. A "thinking..." indicator or streaming response can make 8 seconds feel faster than a silent 3-second wait.
Monitor latency by query type to spot patterns that hurt retention:
What to watch out for
Instead of worrying about latency numbers, focus on user behavior. If users ask pricing questions quickly but never return after slow competitor analysis queries, that's your smoking gun.
In addition, watch for latency variance more than averages. Consistent 4-second responses feel better than responses ranging from 1-10 seconds, even if the average is lower. And if you're hitting 4+ minutes for any query type, you've got bigger problems than measurement. (Although we’re seeing longer latency for “research” mode.)
What to watch out for
Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat.
Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.
Measuring this requires AI analysis to identify when users accomplish their goals vs. when they give up.
Agent Analytics evaluates conversation patterns, user satisfaction signals, and task completion indicators to determine successful resolution and counts the interactions that led to it.
This metric reveals your agent's efficiency at understanding and solving user problems. Ideally, users should get what they need in 1-2 interactions. Higher numbers suggest your agent misunderstands requests, provides incomplete answers, or forces users to clarify obvious context.
Unlike simple conversation length, this KPI focuses on problem resolution. A 10-exchange conversation where the user accomplishes their goal is better than a 3-exchange dead end.
Track interactions to resolve intent to spot:
What to watch out for
Average interactions above 3-4 per resolved intent may signal serious usability problems. Rising trends often indicate your agent is becoming less helpful over time, possibly due to model changes or feature bloat.
Also, watch for high variance. Some users resolve in 1 interaction while others need 8+, which suggests inconsistent agent performance across different use cases.
The KPIs in this guide are your roadmap to proving that your AI strategy is working. Most companies get stuck because they can't connect the dots between agent interactions and actual business outcomes.
But with connected product and agent analytics, you can answer the questions your board and executives are asking: Are your agentic workflows actually enabling users to value your software faster, helping them complete tasks more efficiently, and encouraging them to return more often?
It's the only solution designed to connect all your software data—AI interactions and traditional UI behavior—so you can truly prove that your agents are improving time to value, retention, and productivity.
Pendo Agent Analytics reveals the complete user journey: what users try with your AI agent, when they succeed, when they abandon it for traditional workflows, and how both paths compare on speed, efficiency, and outcomes.
Ready to see it in action? Get a demo of Pendo Agent Analytics.