How to build a Gen AI application - Mental Model | Build AI-Powered Software Agents with AntStack

As teams race to ship AI models, prompts, and infrastructure, the user’s frame of reference is often ignored. Without a clear mental model, even the most capable AI feels unpredictable and hard to trust.

In this blog, I’m going to share about mental model, partition thinking, research on right mental model, some practical designer responsibilities, common designer pitfalls and designer deliverables.

What is a Mental Model?

A mental model is the picture users already have in their head about how something should work, based on their past experiences.

What this thing is
What I can do with it
What will happen if I click/tap this

Users don’t read manuals. They predict behavior from memory.

Let’s consider examples here,

🧠 User’s mental model: “Trash = deleted files.”
💻 Design reality: Drag file to Trash → file is removed
This feels natural because it matches their mental model.

1. Core idea (mental model)

Designers create the user-facing thought partner, not a black-box toy, by:
mapping user goals → appropriate data → clear prompts → predictable UI + safety → measurable outcomes.

Put another way: designers translate human intent into reliable AI behavior, then translate that behavior back into understandable UI & controls.

2. High-level layers (how to partition thinking)

User Goals & Context — What people want to accomplish, when, and why.
Data & Knowledge — What the system must know to be useful and truthful.
Model Interaction — Prompting, RAG, tuning, chain-of-thought, multi-step agents.
UX Surface — Input patterns, controls, feedback, error states, explainability.
Safety & Guardrails — Constraints, verification, human-in-the-loop.
Evaluation & Iteration — Metrics, logs, A/B tests, retraining cycles.

Design work maps to these layers. Each layer informs concrete artifacts (flows, copy, prototypes, tests).

3. Designer checklist (quick reference)

Defined primary user job-to-be-done (JTBD) and success metric.
Key user scenarios prioritized.
Data sources inventoried & mapped to scenarios (internal docs, user data, web).
Decision: Prompt-only vs RAG vs Fine-tune.
Input controls & affordances designed (upload, prompt templates, examples).
Explainability UI (source attribution, confidence, highlight evidence).
Safety flows for hallucinations/errors (undo, verify, escalate).
Monitoring plan (quality metrics, cost, latency, user satisfaction).
Prototype & usability test plan with real prompts/data.

4. How to discover the right mental model (research)

Contextual interviews: observe users accomplishing the task now (tools, workarounds).
Diary/capture studies: collect real prompts/inputs people would use.
Prompt mining: harvest real queries from logs (support, search, chat).
Card sorting/tree testing: understand users’ conceptual categories for retrieval/data labeling.
Paper prototype + think-aloud: test prompt templates and controls before engineering.

Goal: learn user language, expectations about accuracy, and tolerance for error.

5. UX Patterns & Components for Gen-AI

Input

Conversational prompt box with examples & templates (e.g., “Summarize this doc in 3 bullets”).
Smart composer: suggest rewrites, expand/shorten toggles.
File upload + drag/drop for context (PDF, DOCX).
Prompt presets (task-specific starting points).

Output

Structured output modes: bullets, TL;DR, step-by-step, code block.
Evidence strip/citation panel: show text snippets, file names, timestamps.
Confidence indicator: “Likely correct / Needs verification.”
Editable draft: allow the user to edit the AI’s output before use.

Controls & Safety

“Verify” button: run a fact-check or source lookup.
Undo/revert and history.
Human review queue for high-risk outputs (legal, medical, finance).
Rate/feedback controls (thumbs up/down + “why”).
Privacy toggle: exclude sensitive documents from model training.

Onboarding

Progressive disclosure: start with simple prompts, show examples, then advanced controls.
Microcopy to set expectations: “This assistant drafts suggestions — always verify facts.”

6. Handling hallucinations & correctness (designer responsibilities)

Surface traceable evidence: always link outputs to the sources your system used.
Offer certainty signals and why the model answered that way (short trace explanation).
For high-risk domains, require explicit human signoff before action.
Provide escape routes: “I don’t know”, “I might be wrong — check source”, or fallback to human agent.

Design patterns: evidence-first UI, inline citations, conservative phrasing when confidence is low, and clear “not confident” states.

7. Prompt & interaction design (practical)

Use example-based prompts in UI (show a highlighted example that can be edited).
Let users choose style and length (e.g., Formal / Casual; 50 / 150 / 300 words).
Provide stepwise interactions for complex tasks (break tasks into multiple short prompts).
For multi-turn tasks, provide memory controls: “Remember this for future” toggle and “forget” button.

Example prompt template for designers to provide in UI:

“Summarize this 10-page report in 5 bullets focused on product risks. Use simple language suitable for a VP. Cite any sentences you used.”

Design the UI so that this template can be prefilled, edited, or chosen.

8. Prototyping & testing workflow (from lo-fi to prod)

Paper/whiteboard: map user flow + error states.
Clickable Figma: simulate prompt + response cycles, show evidence panel and controls.
Wizard-of-Oz tests: human behind the curtain replying like the AI to validate UX before integration.
Synthetic dataset + sandbox LLM: run realistic prompts to test edge cases.
Beta with telemetry: collect prompts, outputs, user edits, and feedback.
Iterate: update prompts, add data to RAG, or retrain the model.

Wizard-of-Oz is especially effective: designers can prototype AI behavior without model limitations and test acceptance.

9. Designer deliverables (what to hand to engineers & PMs)

JTBD & prioritized scenarios (happy path + top failure modes).
Prompt library: canonical prompts, variants, and expected response format.
Data map: sources, access method, metadata, privacy flags.
Interaction specs: input states, output components, evidence panel, error states.
Onboarding & microcopy: expectations, examples, safety notices.
Monitoring spec: success metrics, instrumentation points (what logs to collect).
Human-in-loop flows: when and how to escalate.

Make prompts and acceptance criteria explicit — these are part of the spec.

10. Metrics & evaluation (what designers track)

Quality metrics

Task success rate (did the user complete JTBD).
User edit rate (how often users edit AI output).
Hallucination rate (false claims per output).
Time-to-complete task (compared to human baseline).

Experience metrics

Net Promoter Score/satisfaction on AI responses.
Prompt rephrase rate (how often users retry).
Retention/engagement lift.

Operational metrics

Latency (perceived slowness).
Cost per request (important for prompt design).
Escalation rate to humans.

Define thresholds and design UX behaviors for breaches (e.g., slow → show loading skeleton + “still working” message).

11. Examples of UX flows (concrete)

A. Customer-support summarizer

Flow:

Upload chat transcript.
Choose “Issue summary” template (3 bullets + action items).
System performs RAG on internal KB + transcript.
AI returns a draft with citations and confidence.
Agent edits, marks resolution code, and sends reply.

Designer choices:

Show citation snippets inline.
Provide quick-edit inline (click to edit bullet).
If confidence < 50% flag for manager review.

B. Marketing brief generator

Flow:

User selects product, tone, and length.
System pulls product spec + recent reviews (RAG).
Output: headline options + 3 ad copies.
Users rate alternatives; the chosen copy is exported to the CMS.

Designer choices:

Provide A/B variants visually.
Add “re-generate with same tone but shorter” micro-action.
Maintain an audit trail of source materials for compliance purposes.

12. Accessibility & inclusion

Ensure keyboard + screen reader-friendly prompt/editor.
Provide text alternatives for generated images.
Ensure generated content avoids bias; provide bias-checking UI if needed.
Support multiple languages or set the locale in the prompt UI.

13. Team roles & collaboration

Designers: user flows, prompt UX, copy, evidence UI, safety flows.
PMs: prioritize scenarios, success metrics, and compliance.
ML engineers: model selection, RAG, vector DBs.
Data engineers: data pipelines, metadata, privacy.
QA: prompt testing, edge-case suites.
Legal / Compliance: content policies, data use agreements.

Designers must own the user experience contract and be part of iteration loops with ML & data teams.

14. Practical prompt testing protocol for designers (mini-lab)

Collect 50 real prompts from users.
For each, run baseline model and log outputs + metadata.
Label outputs: ✅ correct, ⚠️ partially correct (edits needed), ❌ incorrect/hallucinated.
Tally edit types (factual, tone, length, formatting).
Prioritize fixes: adjust prompt → add RAG context → add post-processing → human review.
Re-run the test and measure the change in edit rate/hallucination rate.

Document results and link to product decisions.

15. Common designer pitfalls & how to avoid them

Pitfall: Designing UI before validating prompts.
Fix: Prototype dialogue using Wizard-of-Oz first.
Pitfall: Hiding uncertainty — showing AI as fully authoritative.
Fix: Surface uncertainty & sources; force verification for critical outputs.
Pitfall: Overloading users with AI options.
Fix: Start with sane defaults and progressive disclosure.
Pitfall: Treating AI like a single feature instead of a system.
Fix: Design for continuous feedback, monitoring, and improvement loops.

16. Example microcopy & UX text

On response header: “Draft generated — review and edit. Sources: 3 (click to view).”
Low confidence badge: “Unverified” with tooltip: “This answer may be incomplete — check sources."
Undo: “Revert to original.”
Feedback button: “Was this helpful? 👍 / 👎 — Tell us why.”

Good microcopy sets expectations and guides verification behavior.

How to build a Gen AI application - Mental Model

What is a Mental Model?

1. Core idea (mental model)

2. High-level layers (how to partition thinking)

3. Designer checklist (quick reference)

4. How to discover the right mental model (research)