Image Generation Benchmark

Admin tool for comparing AI models, prompts, and hyperparameters through the production merch pipeline

What is the Benchmark?

The Image Generation Benchmark is an admin-only tool inside Merch Admin for running controlled experiments on merch AI art generation. Each benchmark session sweeps a cartesian product of:

selfies × prompts × models × quality × input fidelity

Every generation runs through the same production pipeline as live merch:

AI generation — gateway model call
Overlay composite — center-crop + template overlay
Background mask — optional removal using the same modes as production render

Results are stored in PostgreSQL and uploaded to S3 so admins can compare outputs side-by-side, like/dislike individual cells, and rerun past sessions.

Where to Access

Environment	URL
Local dev	`http://localhost:3004/admin/merch/benchmark`
Production	Merch Admin → Benchmark tab in the top nav

The Benchmark tab is admin-only. Supplier users are routed to /orders and cannot access it.

Routes:

Page	Path
New Benchmark (setup)	`/benchmark`
Run history	`/benchmark/results`
Session detail	`/benchmark/results/:id`

Architecture

apps/zooly-merch-admin/ (Vite SPA — port 3004)
  Benchmark tab mounts pages from @zooly/merch-benchmark-client
  BenchmarkPathsProvider base="/benchmark"

packages/merch-benchmark/client/ (@zooly/merch-benchmark-client)
  dashboard-page.tsx       Setup form + sticky Run Benchmark bar
  results-page.tsx         Session list
  session-detail-page.tsx  Comparison grid, config, rerun
  components/              Image upload, selfie groups modal, results grid

packages/merch-benchmark/srv/ (@zooly/merch-benchmark-srv)
  create-session.ts        Cartesian product → DB rows
  run-generation.ts        Full pipeline: generate → overlay → mask → S3
  selfies.ts               Western / Japan selfie corpus helpers

packages/merch/img-gen/ (@zooly/merch-img-gen)
  generateAIArt, downloadAndProcessAIImage, resolveBackgroundMask, applyMaskToImage
  Shared with production /api/merch/ai/generate and render steps

apps/zooly-app/app/api/merch/admin/benchmark/
  sessions/                GET list, POST create
  sessions/[id]/           GET detail, PATCH rename
  generate/                POST run one generation
  generations/[id]/like/   PATCH like/dislike
  selfies/                 GET selfie corpus

packages/db/
  schema/merchBenchmarkTables.ts
  access/merch/merch-benchmark.ts

Auth

Same pattern as the rest of merch admin:

Client calls GET /api/merch/admin/* with credentials: "include"
Server routes use requireAdmin() from @zooly/merch-admin-srv
Unauthenticated users redirect to zooly-auth SSO (VITE_AUTH_URL)

Client-side generation loop

After creating a session, the dashboard fires a client-side job queue (run-loop.ts) that POSTs to /api/merch/admin/benchmark/generate with bounded concurrency. The session detail page polls every 3 seconds while status is running.

Setup Page

Session name

Auto-generated timestamp name; editable. Required to run.

Selfies

Two corpora:

Set	Count	Notes
Western	110	Default dry-run pool
Japan	442	Larger corpus

Selection modes:

Mode	Behavior
all	Every selfie in the corpus
range	Inclusive index range (e.g. 1–5)
random	N random selfies at run time
group	Selfies from a saved test group — can mix Western, Japan, and uploaded selfies

Selfie groups are shared with the Simulation tab. Use Manage groups to create, edit, or delete groups from the full pool (Western + Japan + uploaded selfies). The group editor also supports uploading new selfies — they are stored on S3 and become available to every group.

Prompts

One or more named prompt variants. At least one prompt with non-empty text is required.

Template, overlay, and mask

Drag-and-drop image upload fields (or paste a URL). These map to production design assets:

Template — AI template image passed to the model
Overlay — composited on cropped art after generation
Background removal mode — none, edge-only, flood-fill+edge, or mask-image
Mask image — required when mode is mask-image

Models and hyperparameters

Models come from GET /api/merch/admin/models (same catalog as production). Only active models appear.

Setting	Options
Quality	`low`, `medium`, `high`
Input fidelity	`low`, `high`
Concurrency	Parallel generation jobs (default 5)

Run summary bar

A sticky bar at the bottom shows total generations, estimated cost, and a Run Benchmark button. The button is disabled with a hint when required fields are missing.

Results

Run history (`/benchmark/results`)

Lists past sessions with name, date, status, and models used. New Benchmark returns to the setup page.

Session detail (`/benchmark/results/:id`)

Header

Click session name to rename
Status badge: running, completed, or failed
Rerun with these settings — opens setup pre-filled from this session (?from=sessionId)

Stats

Progress, failed count, cost, average duration. Progress bar while running.

Configuration

Expandable section showing prompts (read-only textarea, max ~500px height), template/overlay/mask thumbnails, and run parameters.

Results grid

Rows = selfies; columns = model / prompt / quality / fidelity (selectable via column field dropdown)
Show masked toggle — switches between post-overlay (artUrl) and post-mask (maskedArtUrl) images
Column filters — checkboxes to show/hide specific column values
Click an image for fullscreen view
Thumbs up / down on each completed cell (persisted via API)

Database Schema

Defined in packages/db/src/schema/merchBenchmarkTables.ts. Migration: 0128_merch_benchmark_tables.sql.

`merch_benchmark_session`

One row per benchmark run.

Column	Purpose
`prompts`	JSON array of `{ name, text }`
`template_image_url`, `overlay_url`, `bg_mask_image_url`	Asset URLs
`bg_removal_mode`	Background removal mode
`selfie_test_set`	`western` or `japan`
`total_generations`, `completed_generations`, `failed_generations`	Counters
`status`	`running` \| `completed` \| `failed`
`estimated_cost`, `actual_cost`	Cost tracking

`merch_benchmark_generation`

One row per (selfie, prompt, model, quality, fidelity) combination.

Column	Purpose
`selfie_url`, `selfie_index`	Input selfie
`model_endpoint`, `model_display_name`	From model catalog
`prompt`, `prompt_name`	Prompt text and variant name
`params`	`{ quality, inputFidelity, selfieFilter }`
`art_url`	Post-overlay result (S3)
`masked_art_url`	Post-mask result (S3)
`liked`	`true` / `false` / `null` human review
`status`	`pending` \| `running` \| `completed` \| `failed`
`duration_ms`, `cost_per_image`, `error`	Metrics

Access layer: packages/db/src/access/merch/merch-benchmark.ts

API Reference

All routes under apps/zooly-app/app/api/merch/admin/benchmark/. All require admin auth + CORS.

GET /api/merch/admin/benchmark/sessions

List all sessions with aggregated model names.

POST /api/merch/admin/benchmark/sessions

Create a session and bulk-insert pending generation rows.

Body (key fields):

{
  "name": "Benchmark 2026-06-10",
  "prompts": [{ "name": "Default", "text": "..." }],
  "selfies": [{ "index": 1, "url": "https://..." }],
  "models": [{ "endpointId": "gpt-image", "displayName": "GPT" }],
  "templateImageUrl": null,
  "overlayUrl": null,
  "bgRemovalMode": "none",
  "bgMaskImageUrl": null,
  "concurrency": 5,
  "selfieTestSet": "western",
  "qualities": ["low"],
  "inputFidelities": ["high"],
  "selfieFilter": null
}

GET /api/merch/admin/benchmark/sessions/[id]

Session + all generations.

PATCH /api/merch/admin/benchmark/sessions/[id]

Rename session: { "name": "..." }

POST /api/merch/admin/benchmark/generate

Run one generation: { "generationId": "..." }. maxDuration = 300.

PATCH /api/merch/admin/benchmark/generations/[id]/like

Set review: { "liked": true | false | null }

GET /api/merch/admin/benchmark/selfies?set=western|japan

Returns the selfie corpus for the chosen test set.


### Environment

Merch admin reads `VITE_APP_URL` and `VITE_AUTH_URL` (defaults to `localhost:3004` and `localhost:3003` in dev).

Ensure `ALLOWED_DOMAINS_CORS` in `apps/zooly-app/.env.local` includes `http://localhost:3004`.

Run migration `0128_merch_benchmark_tables` against your local database before first use.



---

## Related Documentation

- [Merch Overview](/docs/merch/overview) — system components
- [Architecture](/docs/merch/architecture) — stack and package layout
- [Database Schema](/docs/merch/database-schema) — all merch tables
- [Environment Setup](/docs/merch/environment-setup) — local dev prerequisites

Environment Setup Prompt Generator