Three requirements for in-product AI agents

Harshal
4 hours ago
4 min read

What good looks like for accuracy, usability, and observability.

From my recent trials building, launching, and tightening AI agents, the same 3 gaps show up again and again. The model is “fine,” but outputs fail rubrics, the UI violates expectations people picked up from ChatGPT and coding agents, or debugging stops at prompts and tool calls while the real question is what changed in the product for that user.

An effective in-product AI agent needs three things:

Accuracy
Usability
Observability

I'll go into a little bit of those three here. I wrote this for product teams building AI agents in their software products.

You need 3 minutes to read this.

Measurable accuracy, flexibility for usability, and observability.

1 - Accuracy: match the task bar users already hold

Users turn to the AI agent to do a task that otherwise they would have done, which means they expected to do it about as well as they could do it. From community research, I've realized that people are okay if AI agents are 70 to 80% there (even if not 100%). A concrete miss is when you ask the agent to add a switch node in a workflow and it configures it like a loop node or uses the wrong settings in it.

Accuracy guardrails are AI or software checks on the output from an AI agent, for example, to ensure that a JSONata formula is valid before sharing the output with the user.

The primary way to improve accuracy is to map failure modes, run evals against a rubric, and add guardrails so bad outputs do not reach the user. I wrote about defining AI evals in this post on AI Product Management.

2 - Usability: behave like a surface, not a black box

Usability means the agent behaves like a surface users control, not a black box. Even an accurate agent is a new surface, the way graphical UIs and command lines were once new. Users form expectations from similar products: ChatGPT and coding agents (Claude, Cursor, Lovable). They expect parity: the agent sees what they see and changes what they can change.

One usability failure: the user asks a question, the agent diagnoses the issue and also changes the product, explains nothing, and offers no undo. The edits might be right; the problem is still usability.

When agents miss that bar, the user has to restate context that is already on screen, repair side effects the agent did not notice, or walk back changes they did not mean to make. The product stops reading as a delegate and starts reading as a brittle workflow.

I wrote the details about this here: Day One User Expectations From AI Copilots in 2026.

3 - Observability: reconstruct user-visible before and after

Most AI observability tools capture:

user text in
model text out
tool calls between steps

But, I propose that's not sufficient observability for AI agent that acts inside software on user's behalf. You as the product team need to be able to see what the user sees. An example for the need for observability is imagine a user says, "Can you fix this?" For you to evaluate whether the agent did the right thing for this user input, you need to be able to see which page in your product the user was on, what resources they had open or selected.

Define "before and after" as three artifacts you can inspect later for the same agent action:

a user-visible snapshot of the surface the agent touched (for example a screenshot or equivalent render capture)
a structured record of that surface when you need to diff or search it (for example page structure or application state for the relevant region, not the whole page by default)
the tool or API calls the agent issued, with inputs and outputs, so you can separate model reasoning from side effects

Reconstruct what the user saw and what changed in the product for them.

I wrote about the need for bespoke observability in What AI Observability Needs to Capture.

Interlocked rings of accuracy, usability, and observability.

What I am deprioritizing

Other criteria still apply to AI agents, but I rank them below accuracy, usability, and observability.

Cost. Many users run on promotional credits, and vendors often subsidize agent access. Price is a weaker differentiator for in-product AI agents today. Although, here's a counter-point for AI coding agents.
Speed. Prefer an agent that takes a few minutes, does the right work, and allows a quick undo instead of a five-second answer that misses the mark.
Privacy, security, and compliance. Product teams still need to meet a baseline of these needs. But, many target to ship value to an early segment first, then invest in stronger controls.