The Likeness Search system follows a layered architecture optimized for Vercel's serverless environment:
┌─────────────────────────────────────────────┐
│ API Layer (Next.js Route Handlers) │
│ apps/zooly-app/app/api/indexing/ │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Likeness Search Package │
│ @zooly/likeness-search │
│ packages/likeness-search/src/ │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Utility Packages │
│ @zooly/util-elevenlabs, @zooly/util-srv │
│ @zooly/social-scraper │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Database Access Layer │
│ packages/db/src/access/ │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Database Schema (Drizzle ORM) │
│ packages/db/src/schema/ │
└─────────────────────────────────────────────┘
Location: packages/db/src/schema/
Six core tables support the search system:
likeness_assets - Stores uploaded images and voice samples with AI-extracted tagslikeness_search - SQL search index with 30+ enum fields for exact filteringlikeness_search_vector - pgvector embeddings for semantic similarity searchlikeness_need_indexing_queue - Event queue driving the indexing pipelineaccount_social_links - Social media links and platform-specific follower countsscrapes - Social media scraping results and retry trackingSee Database Schema for complete table definitions.
Location: packages/db/src/access/
Provides type-safe access functions following the project's access pattern:
likenessNeedIndexingQueue.ts - Event CRUD and status managementlikenessAssets.ts - Asset CRUD and tag updateslikenessSearch.ts - SQL search index upserts and querieslikenessSearchVector.ts - Vector embedding operationsbaseRequirementsChecker.ts - Base requirements validationdataSufficiencyChecker.ts - Data sufficiency evaluationaccountSocialLinks.ts - Social link managementscrapes.ts - Scraping result storageKey Principle: Tables are never exposed directly. All database access goes through access functions, ensuring consistent filtering, type safety, and data integrity.
Location: packages/likeness-search/src/
Package: @zooly/likeness-search
A dedicated package containing core business logic organized by responsibility:
indexingDaemon.ts - Main daemon loop that processes the queueprocessEvent.ts - Processes individual queue eventsupsertToIndex.ts - Upserts account data to SQL and vector indexesaggregateAccountTags.ts - Aggregates tags from multiple sourcesgenerateTagsFromImage.ts - AI image tag extraction (Gemini vision)generateTagsFromVoice.ts - AI voice tag extraction (Gemini audio)generateVoiceSampleText.ts - AI demo script generationtriggerSubProcess.ts - Fire-and-forget API triggerssearchLikeness.ts - Main search function (SQL with vector fallback)vectorFallbackSearch.ts - Vector similarity searchformatSearchResults.ts - Result enrichment and formattingapiRetryHandler.ts - Error classification and retry logicvalidateAudioUrl.ts - Audio URL validationschemas/likenessTagsSchema.ts - Zod schemas for tag validationThe system leverages shared utility packages:
@zooly/util-elevenlabsLocation: packages/util-elevenlabs/src/
createVoiceSample.ts - Complete voice sample creation workflow (ElevenLabs voice clone + TTS generation)voice-management.ts - Voice management functions (create, update, delete, list)elevenlabs-service.ts - Low-level ElevenLabs API operations@zooly/util-srvLocation: packages/util-srv/src/
generateEmbedding.ts - OpenAI embedding generation for vector search@zooly/social-scraperLocation: packages/social-scraper/src/
A dedicated package for social media scraping operations:
processSocialScraping.ts - Main orchestration function that scrapes all social links for an accountparseSocialUrl.ts - URL normalization and platform detectionscrapers/ - Platform-specific scraper functions:
instagram.ts - Instagram profile scrapingtiktok.ts - TikTok profile scrapingtwitter.ts - Twitter/X profile scrapingyoutube.ts - YouTube channel scrapinglinkedin.ts - LinkedIn profile scrapingindex.ts - Scraper registry (platform → function mapping)types.ts - Type definitions for scraping resultsPurpose: Collects follower counts, profile images, and metadata from social media platforms to enhance search indexing and populate missing account data.
Integration with Indexing Pipeline:
Trigger Points: Social scraping is triggered when:
Process Flow:
triggerSocialScraping() (fire-and-forget)/api/indexing/scrape-social receives the requestprocessSocialScraping() which:
account_social_links tableaccount_social_links.followersCount per platformscrapes table for retry trackinglikeness_assets entries from scraped avatarsData Integration:
aggregateAccountTags)likeness_assets entries for AI tag extractionSOCIAL_DATA queue event to trigger re-indexingError Handling:
Location: apps/zooly-app/app/api/indexing/
Next.js route handlers that expose the system:
process-queue/route.ts - Cron endpoint (GET) - Protected by CRON_SECRET
generate-tags/route.ts - AI tag generation (POST) - Protected by CRON_SECRET
generateTagsFromImage() and generateTagsFromVoice() from @zooly/likeness-searchcreateVoiceSample() from @zooly/util-elevenlabsscrape-social/route.ts - Social scraping (POST) - Protected by CRON_SECRET
processSocialScraping() from @zooly/social-scraperaccount_social_links tablelikeness_assets entriesSOCIAL_DATA queue event for re-indexingsearch/route.ts - Public search (GET/POST) - No authentication required
searchLikeness() from @zooly/likeness-searchThe system integrates with several external services:
text-embedding-3-small for vector embeddings (1536 dimensions)gemini-2.5-flash for image and audio analysis@zooly/social-scraper)Decision: Use both SQL and vector indexes
Rationale:
Implementation: SQL search runs first. If it returns 0 results (and offset=0), vector search is used as fallback.
Decision: Event-driven queue architecture
Rationale:
Implementation: Events are added to queue, daemon processes them via cron, sub-processes run as separate API calls.
Decision: Sub-processes triggered via non-blocking fetch() calls
Rationale:
Implementation: triggerSubProcess.ts uses fetch() without await, sub-processes mark events as completed when done.
Decision: Everything keyed by accountId (not userId)
Rationale:
Implementation: All tables reference account.id, all access functions take accountId parameter.
Decision: Keep vector embeddings in separate table from SQL index
Rationale:
Implementation: likeness_search_vector table stores accountId, content (text), and embedding (vector).
Decision: Index accounts even with incomplete data if they have at least one image
Rationale:
Implementation: After exhausting data collection options, if account has ≥1 image asset, index with available data.
Decision: Track retry attempts per asset and per social link
Rationale:
Implementation: likenessAssets.tagAttemptCount and scrapes.attemptCount track individual retries.
Decision: Social scraping runs as a separate, asynchronous sub-process via fire-and-forget API calls
Rationale:
Implementation:
triggerSocialScraping() makes non-blocking fetch() call to /api/indexing/scrape-socialIN_PROGRESS while scraping runsSOCIAL_DATA event on completion to trigger re-indexingscrapes tableThe system requires the following environment variables:
| Variable | Purpose |
|---|---|
CRON_SECRET | Bearer token for cron/internal API authentication |
INDEXING_DAEMON_BATCH_SIZE | Max events per daemon run (default: 250) |
NEXT_PUBLIC_APP_URL | Base URL for fire-and-forget API calls |
OPENAI_API_KEY | OpenAI API key for embeddings |
ELEVEN_LABS_API_KEY | ElevenLabs API key for voice cloning |
AWS_BUCKET_NAME | S3 bucket for voice samples |
AWS_REGION | AWS region for S3 |
AWS_ACCESS_KEY_ID | AWS access key |
AWS_SECRET_ACCESS_KEY | AWS secret key |
NEXT_PUBLIC_AWS_BUCKET_URL | Public S3 bucket URL prefix |
DATABASE_URL | PostgreSQL connection URL |
(status, createdAt) and (accountId, status)accountIdThe system implements multi-level error handling:
See Error Handling for detailed retry logic.
On This Page
Architecture OverviewSystem Components1. Database Schema Layer2. Database Access Layer3. Likeness Search PackageIndexing PipelineData CollectionSearch FunctionsUtilities4. Utility Packages[object Object][object Object][object Object]5. API Layer6. External Services IntegrationDesign Decisions1. Dual-Index Strategy2. Queue-Based Async Processing3. Fire-and-Forget Sub-Processes4. Account-Centric Design5. Separate Vector Table6. Best-Effort Indexing7. Per-Record Retry Tracking8. Asynchronous Social ScrapingData Flow DiagramsIndexing FlowSearch FlowEvent Processing FlowEnvironment VariablesPerformance ConsiderationsIndexing PerformanceSearch PerformanceDatabase OptimizationError Handling Strategy