← Back to Work

01Amazon · Case Study

Working Backwards AI

An AI-powered product thinking assistant that challenges assumptions, surfaces insights, and helps PMs make better decisions early.

My RoleProduct Design · UX/UI · UX Research
TeamPM · AI Science · Engineering
Target Users8K+ Amazon PMs
Timeline2025 Q1 Exploration → Q4 Production Launch

01 — Context

Preventing customer experience risks before launch.

The Customer Experience Risk Service (CXRS) team focuses on identifying and preventing customer experience risks before products launch. I joined as the design lead, and over four quarters took the product from exploration to production launch.

My impact

  • Reframed the problem space
  • Led qualitative research and AI UX exploration
  • Translated insights into product direction
  • Drove cross-functional alignment and execution
Working Backwards AI welcome screen on a laptop, with Ideate, Write, and Review modes

02 — The Problem

Customer issues were discovered only after launch.

Many customer issues surface only after launch — when complaints reach Customer Service. At Amazon's scale, even small product decisions can impact millions of customers.

315M
Active Amazon customers
438M
CSA-handled customer contacts per year
652M
Self-service complaints per year
$1.28B
Annual customer service cost

Based on Amazon's 2024 operational metrics.

03 — Reframing the Challenge

How might we

Help Amazon product teams anticipate and prevent customer complaints before product launch?

Our hypothesis: AI can simulate customer reactions before launch.

Identify customer experience risks before launch Reduce post-launch complaints

04 — First Attempt

Piper: a customer service AI module.

The first concept, "Piper," was a Customer Service AI module — a pre-mortem tool that scanned product documents and flagged customer-service risks.

Early Piper concept — a customer service risk identification module

A quick feasibility test with an AI Science demo and a UX prototype surfaced a hard truth: users treated Piper as a downstream tool, but preventing customer issues requires upstream decisions. Risks were identified too late to influence product direction.

What we learned

The module surfaced risks — but at the wrong stage of the product lifecycle.

05 — The Pivot

From risk detection to early decision support.

Using PM jobs-to-be-done analysis, persona work, and storytelling, I made the case to leadership for repositioning the product — away from downstream optimization, toward helping PMs make customer-centered decisions earlier and influence product direction proactively.

Before — Piper Risk detection

Downstream optimization: flag customer-service risks after the idea is already locked.

After — WBAI Early decision support

Help PMs make customer-centered decisions earlier and shape product direction proactively.

Pivot framing — moving from downstream risk detection to early decision support

06 — Research

Listening to how PMs actually work.

I analyzed recurring pain points from the PM Slack channel and ran a jobs-to-be-done ideation workshop, then validated the direction through user sessions. Decision making under uncertainty emerged as the main challenge for most PMs.

This is most useful when I'm still shaping the idea. Once the PRFAQ is mostly locked, it's already too late.
I probably won't use it word for word, but at least it takes the cognitive load off — like, where do I even begin?
Seeing these customer service risks is helpful, but I'm not sure what I'm supposed to do after this.
It identified an account administrator and IT support manager. When we were scoping this, I didn't even consider this persona.

Key insight

PMs lack a reliable way to challenge early product ideas. As a result, gaps and flawed assumptions are often discovered too late.

07 — Proposal 1.0

Explicit multi-agent collaboration.

I partnered with PM to define the initial roadmap — a conversational AI interface, a writing coach, and specialized AI agents — prioritized by user impact, decision value, and engineering complexity. Proposal 1.0 made the multi-agent structure explicit, prioritizing clarity and trust: a clear mental model of expertise, transparent AI reasoning, reduced hallucination risk, and easier debugging and evaluation.

Proposal 1.0 — explicit multi-agent collaboration interface

What early testing revealed

Issue 01 AI mental model mismatch

PMs didn't understand when or why to choose different agents.

Issue 02 High cognitive load

Conversation and key insights were mixed together in one chat stream.

For the MVP I simplified the AI experience — simpler interaction patterns, guided onboarding, and a lower learning curve — and shifted from conversation to artifacts: a dual-panel workspace separating ideation from structured outputs, with automatic PRD drafts generated from AI conversations.

08 — Final Proposal

Three design shifts in Proposal 2.0.

Task-guided conversation

Prompt suggestions based on PM work stages lower cognitive load, keep the decision flow uninterrupted, and let the system guide the thinking.

Task-guided conversation entry with prompt suggestions based on PM work stages

Workspace canvas

A dedicated canvas separates thinking from chatting, makes progress and iteration tangible, and keeps decision-critical content stable and visible.

Workspace canvas separating conversation from the structured document artifact

Inline customer & expert commentary

Different AI experts review the document and surface CX, tech, and customer insights directly on the artifact — multiple perspectives without fragmentation, interactive comment threads, and clear ownership and resolution.

Inline customer and expert commentary on the document artifact

09 — Tradeoffs & System Thinking

When engineering reality forces a UX tradeoff.

Mid-build, the architecture moved from an explicit multi-agent structure to a super-prompt orchestration model — compressing each expert's reasoning into a single response. Faster and cheaper, but it reduced transparency and user control over AI outputs. To recover trust, I introduced two guardrails:

Guardrail 01 AI Thinking Indicator

A visible "thinking" state that manages expectations during generation.

Guardrail 02 Version History

Automatic snapshots let users safely experiment and roll back if the AI output goes off track.

Trust guardrails — AI thinking indicator and version history with automatic snapshots

10 — Scaling the System

An AI component sub-library.

As WBAI evolved into an AI-native experience, standard components were no longer sufficient. I initiated an AI-specific component sub-library to support scalable, consistent, and explainable interactions — including inline comments, threaded discussions, and AI states.

AI component sub-library — inline comments, threads, and AI states
Before and after — the experience rebuilt on the AI component library

11 — Signals of Impact

Adoption, decision impact, and satisfaction.

1,000+
Amazon PMs onboarded in the first month after launch
76%
of users said WBAI helped them improve their product idea
53.5
Net Promoter Score — "Great"
92.9%
Customer Satisfaction Score — "Excellent"

Data as of January 27, 2026 — first six weeks after initial release.

I had a review of the PR-FAQ today with my director and I saved around 50–75% of the time due to the tool.
It stays true to the 5 Customer Questions to help formulate the true problem — and keeps me from jumping to a solution.

12 — Reflections

If I were to do it again.

Reframing the core problem

The biggest impact came from reframing WBAI from a CS-surface tool into a decision-support platform. WBAI is no longer an experiment in AI-assisted writing — it's a foundation for scalable, decision-centered AI support across PM workflows.

Intelligence requires restraint

The stronger the AI becomes, the more structured the UX must be. Users don't need to see the complexity — they need clarity.

Improve transparency of AI outputs

I would invest more in features that help users understand and evaluate AI outputs, not just interact with them. Thinking indicators show the system is working — but users still need clarity on why the AI generated a particular suggestion.