Server-side retry cron, lifecycle tracking, and admin Generation dashboard (ZLY-1099)
AI Generation Retry & Recovery (ZLY-1099) closes the gap where a shopper leaves their email and walks away while AI art generation runs server-side. Today generation is synchronous — the client POSTs /ai/generate and waits up to 300 seconds. On failure the only foreground recovery is a manual page reload, and there is no server-side retry or persisted failure state. If the delayed-email path fails, no "your design is ready" email is ever sent and the shopper silently never receives their artwork.
This feature adds:
merch_session — status, attempt count, last attempt time, and last error.zooly-stats with KPI cards, filters, and manual Regenerate / Resend email actions.| Surface | URL / path | Who |
|---|---|---|
| Generation dashboard | zooly-stats → Merch → Generation (/merch/generation) | Admin only |
| Cron endpoint | GET /api/merch/cron/generation-retry on zooly-app | Bearer CRON_SECRET (Vercel cron) |
| Manual regenerate | POST /api/merch/admin/generation/{sessionId}/retry on zooly-app | Merch admin |
| Manual resend email | POST /api/merch/admin/generation/{sessionId}/resend on zooly-app | Merch admin |
Local dev defaults:
| App | Port |
|---|---|
zooly-stats | 3010 |
zooly-app | 3000 (API default 3004 in stats client env) |
| Merch SPA | 3008 |
Foreground shoppers who stay on the page can tap Retry (page reload). The delayed-email shopper has no recovery path.
The create page still drives generation synchronously — POST /api/merch/ai/generate (up to 300s) while polling GET /api/merch/session/{id}/generation-status. Client orchestration lives in create-page.tsx; the poll/race helper is packages/merch/client/src/lib/ai-generation.ts (STALE_IN_PROGRESS_MS = 5 min, matching the server lock).
ZLY-1099 does not replace this foreground path — it adds server-side retry for sessions that fail while the shopper is gone.
| Question | Decision | Rationale |
|---|---|---|
| Which failures does the cron retry? | Primary target: delayed-email shoppers who left before art completed | Highest missed-revenue case; foreground users already self-serve via reload |
| How does the cron invoke generation? | Reconcile kick — POST /ai/generate with CRON_SECRET | Reuses full path including ready email; matches offers crons |
| Where is status stored? | merch_session columns (not analytics sidecar) | Ticket requirement; simpler cron + dashboard queries |
| Manual dashboard actions? | Regenerate + resend ready email | Fulfillment push already exists via supplier assign |
| Retry cadence | Every 5 min, max 3 attempts | Per ticket; then failed → dashboard |
Implementation note: the impl plan originally scoped the DB candidate query to delayed_email IS NOT NULL. The shipped query is broader — any session with tracked status, no ai_art_key, and a design — still filtered to personalized designs + open campaigns in resolveRetryTargets. Foreground failures without a delayed email can therefore enter the cron set if they have a tracked generation_status.
Attempt counter: markGenerationAttemptStarted runs on every /ai/generate call, including foreground shopper retries. All attempts count toward the 3-try cron cap (not gated to delayed_email only).
| Status | Meaning |
|---|---|
pending | Generation expected but not yet running (e.g. after admin reset) |
in_progress | A generation attempt is running (lock window: 5 min) |
succeeded | Art produced (ai_art_key set); session removed from cron set |
failed | Last attempt failed; may still be retried until attempt cap |
/ai/generate writes the lifecycle:
markGenerationAttemptStarted (increments generation_attempts, sets in_progress)markGenerationSucceededmarkGenerationFailed (records error message, does not reset attempts)The cron reuses the existing generate path, which already calls sendMerchDelayedReadyEmail when delayed_email is set and art was missing at the start of the run. A successful retry therefore sends the same email the shopper would have received on first success.
Two layers prevent duplicate concurrent generation for the same session:
| Layer | Mechanism | Stale after |
|---|---|---|
| Analytics lock | merch_session_analytics.generation_started_at set, generation_completed_at null | 5 min → next request proceeds |
| Session status | merch_session.generation_status = in_progress | Cron treats as stale after 5 min (STALE_LOCK_MS) |
If /ai/generate is called while analytics shows an in-flight run younger than 5 min, it returns 409 { inProgress: true }. The cron and admin retry treat 409 as healthy (generation already running), not an error.
On successful generation, /ai/generate may send the "your design is ready" email via sendMerchDelayedReadyEmail (@zooly/merch-fulfillment-srv).
Send conditions (all must hold):
delayed_email is set on the session (re-fetched after generation so a PATCH during the run is picked up)!session.aiArtKey at entry) — prevents duplicate sends when a second kick runs after art already existsNEXT_PUBLIC_MERCH_URL (or NEXT_PUBLIC_APP_URL/merch) is configuredLink target: {merchHost}/{talentSlug}/{campaignSlug}/create?s={sessionId}
i18n keys (default-merch-strings.ts): email_delayed_subject, email_delayed_header, email_delayed_body, email_delayed_cta — overridden per campaign via createTranslator(campaign.i18n, defaultMerchStrings).
Manual Resend in the dashboard repeats this send for sessions that already have ai_art_key + delayed_email.
apps/zooly-app/
app/api/merch/cron/generation-retry/route.ts 5-min cron (thin router)
app/api/merch/ai/generate/route.ts Lifecycle writes + generation
app/api/merch/admin/generation/[sessionId]/retry/route.ts Manual regenerate
app/api/merch/admin/generation/[sessionId]/resend/route.ts Manual resend email
vercel.json schedule: */5 * * * *
packages/merch/srv/ (@zooly/merch-srv)
generation-retry.ts resolveRetryTargets()
packages/db/ (@zooly/db)
access/merch/merch-session.ts listRetryableGenerations, mark*, resetGenerationAttempts
schema/merchEnums.ts merch_generation_status enum
schema/merchTables.ts generation_* columns on merch_session
apps/zooly-stats/
app/merch/generation/ Admin dashboard (read-only DB)
app/api/admin/merch/generation/route.ts List + KPI counts API
zooly-stats uses a read-only production DB connection. The dashboard reads session rows via stats APIs and writes manual actions through cross-origin zooly-app admin routes (credentials: include + CORS).
The cron does not duplicate generation logic. It lists candidates, then POSTs /api/merch/ai/generate server-to-server with Authorization: Bearer $CRON_SECRET — the same pattern as offers crons (image-candidates/reconcile). Benefits:
Route: GET /api/merch/cron/generation-retry
Auth: Authorization: Bearer $CRON_SECRET
Schedule: every 5 minutes (*/5 * * * * in apps/zooly-app/vercel.json)
Constants:
| Constant | Value | Purpose |
|---|---|---|
RETRY_AFTER_MS | 5 min | Minimum gap between attempts |
STALE_LOCK_MS | 5 min | in_progress older than this → treat as dead |
MAX_ATTEMPTS | 3 | Cron stops retrying after this |
BATCH_LIMIT | 50 | Max sessions per tick |
maxDuration | 300s | Matches generate route timeout |
Candidate query (listRetryableGenerations):
ai_art_key IS NULL — art never completeddesign_id IS NOT NULL — needs a design to generategeneration_attempts < maxAttemptsfailed, pending, or stale in_progressretryAfterMs (or never recorded)Service filter (resolveRetryTargets):
designMode === "branded") — no AI, template image onlyisMerchCampaignOpen)Response: { pickedUp, kicked, errors[] }
Env:
| Variable | Purpose |
|---|---|
CRON_SECRET | Bearer token for cron auth |
NEXT_PUBLIC_MERCH_URL | Base URL for ready-email / resend links |
NEXT_PUBLIC_MERCH_API_URL | Stats client base for cross-origin admin actions (default http://localhost:3004) |
ALLOWED_DOMAINS_CORS | Must include the zooly-stats origin (e.g. http://localhost:3010) so dashboard Regenerate/Resend calls succeed |
Local test:
curl -H "Authorization: Bearer $CRON_SECRET" \
http://localhost:3000/api/merch/cron/generation-retry
Path: /merch/generation in zooly-stats (linked from Merch → Stats nav tabs).
Access: cookie auth via getVerifiedUserInfo; unauthenticated users redirect to SSO with returnTo; non-admins redirect to /denied.
Default filter: Failed. Only sessions with generation_status IS NOT NULL appear (a session that never hit /ai/generate is invisible here). List capped at 200 rows, ordered by generation_last_attempt_at DESC.
Counts across all tracked sessions (status filter does not affect KPIs):
| Column | Source |
|---|---|
| Store | Campaign talent_name / slug |
merch_session.delayed_email | |
| Status | generation_status pill |
| Attempts | generation_attempts |
| Last attempt | generation_last_attempt_at |
| Error | generation_error (truncated) |
Filters: status and store (campaign).
| Action | API | When enabled |
|---|---|---|
| Regenerate | POST .../retry | Always (resets attempt counter, re-kicks generate with forceRegenerate: true) |
| Resend email | POST .../resend | Only when ai_art_key and delayed_email are both set |
Both routes require requireAdmin() from @zooly/merch-admin-srv.
Resend builds the result URL as {merchHost}/{talentSlug}/{campaignSlug}/create?s={sessionId} and calls sendMerchDelayedReadyEmail with campaign i18n.
merch_generation_status enumpending · in_progress · succeeded · failed
merch_session (new columns)| Column | Type | Purpose |
|---|---|---|
generation_status | enum | Current lifecycle state |
generation_attempts | integer (default 0) | Total attempts (cron + foreground) |
generation_last_attempt_at | timestamptz | Last kick timestamp |
generation_error | text | Last failure message (max 1000 chars) |
Index: merch_session_generation_status_idx on generation_status.
Migration: packages/db/drizzle/0125_merch_session_generation_status.sql (idempotent).
merch_session_analytics still tracks generation_started_at / generation_completed_at for funnel analytics. ZLY-1099 adds session-level status for cron eligibility and admin triage — the ticket explicitly asked to track generation on the session.
Schema check:
set -a && source .env.local
psql "$DATABASE_URL" -c "\d merch_session" | grep generation
# expect: generation_status, generation_attempts, generation_last_attempt_at, generation_error
Seed a failed generation:
UPDATE merch_session
SET delayed_email = 'you@example.com',
ai_art_key = NULL,
generation_status = 'failed',
generation_attempts = 1,
generation_last_attempt_at = now() - interval '10 minutes'
WHERE id = '<session-with-personalized-design>';
Trigger cron:
curl -s -H "Authorization: Bearer $CRON_SECRET" \
http://localhost:3004/api/merch/cron/generation-retry | jq
# expect: { "pickedUp": >=1, "kicked": >=1, "errors": [] }
After success: generation_status = succeeded, ai_art_key set, ready email sent. After 3 failures the row stays failed and is no longer picked up until an admin clicks Regenerate (resets attempts).
Dashboard: open http://localhost:3010/merch/generation with an admin cookie. Confirm KPI cards, status/store filters, and manual actions. Verify CORS if Regenerate/Resend fail cross-origin.
merch_session_generation is not used for failure recovery; cron targets the session's primary generation.PATCH /orders/{id}/items/{itemId}/supplier).See also: Abandoned Cart Recovery for the parallel hourly cart cron pattern.
On This Page
OverviewWhere to AccessThe Problem (Before ZLY-1099)Foreground generation (unchanged)Design DecisionsEnd-to-End Flow (After ZLY-1099)Lifecycle statesDelayed-Email PathIn-Flight Lock (409)Ready EmailArchitectureRead vs write splitReconcile-kick patternCron DetailsAdmin DashboardKPI cardsTable columnsManual actionsDatabase[object Object], enum[object Object], (new columns)Relationship to analytics milestonesLocal VerificationOut of Scope