Agent Playground

Section 1 — Hero

AI Product Management Zero to One Agent Configuration Enterprise SaaS Cortx

Agent Playground —

Where enterprise AI agents are built, tested, and trusted

Before Agent Playground existed, configuring an AI agent at an enterprise meant filing a request and waiting for an engineer. I built the product from zero — a complete agent assembly and testing environment that lets both technical and non-technical users build production-ready agents themselves.

Role

PM Associate · Cortx

Product

Cortx — AI Control Plane for Enterprises

Timeline

Two Quarters

Tools

FigJam · Figma · Miro · ADO

Section 2 — Stats

0 → 1

Built entirely from scratch — no prior product, no reference implementation

Self-serve

Enterprise clients configuring and deploying agents without Cortx team involvement

Section 3 — The Problem

The problem

Agents existed. Nobody could build them independently.

Cortx had AI agents. Enterprise clients wanted to use them. The gap was everything in between — no environment to interact with an agent, understand its behaviour, adjust its configuration, or validate it before production.

Two completely different users. The same root problem: no place to build, test, and trust an agent before it goes live.

Engineering as the only configuration path

Every agent configuration went through an engineering ticket. No self-serve layer existed for non-technical users.

Agents were black boxes to business users

Business teams could describe the outcome they wanted but had no visibility into how the agent was configured or why it behaved the way it did.

No sandbox for developers

Testing happened in production. Every iteration carried real risk and every mistake had real consequences.

No iteration without a new ticket

Every change — however small — required filing a request, waiting for a sprint, and reviewing the output. The feedback loop was measured in days.

Section 4 — The Product

The product

Not just a chat window — a complete agent assembly environment

An agent's behaviour is a function of everything that goes into it. A playground that only shows chat output isn't a playground — it's a demo. I defined Agent Playground as a five-layer assembly environment. Explore each layer below.

🧠

Models — the intelligence layer

Users select and switch between the underlying AI models powering the agent. Different models have different strengths — response quality, speed, cost, context window. The playground makes that comparison possible without an engineering ticket.

Why it matters: changing the model changes everything downstream. This is where the fundamental capability of the agent is set.

Section 5 — Two Users

Two users, one product

The non-technical builder and the power user

Most enterprise tools are built for one type of user and quietly ignore the other. I defined Agent Playground to serve two distinct users on the same underlying framework — different entry points, not different products.

The non-technical builder

Ops leads · Business analysts · PMs

They know what they want the agent to do but have never thought about models, tools, or knowledge sources as separate configurable layers. They needed progressive disclosure — the most important controls first, advanced configuration available but not imposed.

Design principle: every setting should explain itself — plain language, not technical jargon buried in a tooltip.

The power user

Developers · Solution architects · Technical PMs

They know what they're doing. They don't need guidance through every layer — they want direct access to configuration, fast. An advanced mode that bypasses the guided flow and lets them work directly on the assembly environment.

Design principle: same product, different front door. No feature gates — just two entry points into the same builder.

The two-mode decision wasn't just a UX call. Enterprise tools that only serve power users leave most of the organisation behind. Tools that only serve non-technical users frustrate the people who need depth. Both groups existed at every enterprise client — the right answer was both.

Section 6 — Freedom vs Guardrails

The central tension

Freedom vs. guardrails

The hardest product decision I made touched every other decision: how much control do you give users, and where do you put the limits? Drag the slider to explore what each extreme looked like — and why neither was right.

Full freedom Full guardrails

No limits

Mostly free

My framework

Restricted

Full control

The framework I built

Freedom within validated boundaries

Users can configure every layer — but the system validates each configuration before it can be saved. Invalid combinations are flagged before they cause problems. Guardrails that prevent bad outcomes are different from guardrails that prevent exploration.

Section 7 — Key Decisions

Key decisions

The calls that shaped the product

Click each decision to reveal the reasoning behind it.

01 — Scope

Build a full assembly environment, not just a chat interface

↓

The minimal version was a chat window — simple, shippable, defensible. A chat window shows you the output of an agent — not the system producing it. If a user sees something they don't like, they have no way to understand why it happened or what to change.

Why: The value of a playground is the ability to experiment — and you can't experiment with something you can't configure. The assembly environment was harder to build, but it was the only version that delivered on the name.

02 — Versioning

Automatic version history, not manual saves

↓

I could have made version saving an explicit action — a button the user clicks. Simpler to implement, cleaner interface. But non-technical users don't think in versions. They configure, test, adjust, configure again. If saving requires a deliberate action, most users never do it.

Why: Automatic versioning means every state is preserved without the user having to think about it. The safety net actually gets used — and it's what makes experimentation feel safe.

03 — Error states

Agent failures as learning moments, not error messages

↓

When an agent produces a bad output, the system could respond two ways: show an error, or show an explanation. An error tells the user something went wrong. An explanation tells them what went wrong and what to change.

Why: The playground's entire value is iteration. An error message stops iteration. An explanation accelerates it. Every unexpected output should be a data point, not a dead end.

Section 8 — What I Didn't Build

What I didn't build

The deliberate cuts

Good product decisions aren't just about what you ship. They're about what you choose not to — and being clear-eyed about why.

Every possible configuration option at launch

There were requests for advanced options beyond the five core layers. I scoped v1 to what worked reliably for the most common use cases. Shipping everything would have added complexity without proportional user value — most users would never touch 80% of it in the first month.

AI-assigned configurations without user confirmation

There was a conversation about letting the AI auto-configure the agent from a natural language description, skipping manual layer configuration. I said no. If users don't consciously configure each layer, they don't understand why the agent behaves the way it does — and they can't debug it when it doesn't.

A separate product for developers

Some stakeholders pushed for a separate developer-focused interface. I pushed back — two codebases, two user bases, a positioning problem. Same product, different front door was the right answer.

Section 9 — Impact

Impact

What changed

Agent Playground didn't produce a single headline metric. What it produced was a shift in how enterprise clients related to their AI agents — from passive consumers of engineering output to active builders of their own tools.

⚡

Self-serve deployment

Enterprise clients configuring and deploying agents without Cortx team involvement for the first time.

🔄

Faster iteration

Changes that took days through engineering sprints were happening in hours inside the playground.

🛡️

Fewer production failures

Agents tested before deployment — unexpected production behaviour dropped significantly at launch.

The outcome I'm most proud of: we didn't just save time. We eliminated an entire category of coordination overhead that no one had named — the invisible dependency between wanting an agent and being able to build one. Enterprise teams are now shipping faster not because they work harder, but because the system works for them.

Section 10 — Learnings

Learnings

Zero-to-one products require you to define the problem before the product. Discovery work — understanding what both user types actually needed — was the most important work I did. Everything else followed from it.

The freedom vs. guardrails tension never fully resolves. The right balance shifts as users get more sophisticated. Building boundaries that are adjustable — not fixed — is something I'd prioritise earlier next time.

Designing for non-technical users in a technical product is a navigation challenge, not a simplification challenge. Hiding complexity produces a product that frustrates experts. Making it navigable produces one that grows with the user.

Automatic version history was the right call and I'd make it earlier. It almost got deferred from v1. It ended up being one of the most-used features at launch — because it's the safety net that makes experimentation feel safe.

Let's Connect!

Building and scaling products—from 0 to 1, and from 1 to 10. Let’s create what’s next.