Search Flow

Detailed search process and algorithms

Overview

The search system provides a dual-index approach: fast SQL filtering for exact matches, with semantic vector search as a fallback for fuzzy queries. This ensures buyers can find relevant talent even with incomplete or imprecise search criteria.

Search Architecture

flowchart TD A[Buyer Submits Query] --> B{Query Format} B -->|Brief Text| C[AI Extract Filters] B -->|Direct Filters| D[Use Filters Directly] C --> D D --> E[SQL Search] E --> F{Results Found?} F -->|Yes| G[Format Results] F -->|No and First Page| H[Vector Fallback Search] H --> I[Generate Query Embedding] I --> J[Vector Similarity Search] J --> K[Format Results] G --> L[Return to Buyer] K --> L

Search API

Endpoint

Location: apps/zooly-app/app/api/indexing/search/route.ts

GET /api/indexing/search?name_query=<query>&limit=<number>&offset=<number>

POST /api/indexing/search

{
  "filters": {
    "category": "MODELS",
    "gender": "FEMALE",
    "hairColor": "BLONDE",
    "minFollowers": 100000
  },
  "limit": 20,
  "offset": 0,
  "orderBy": "numberOfFollowers",
  "orderDirection": "desc"
}

Search Filters

Type: SearchFilters (from packages/types/src/types/LikenessSearch.ts)

Supports filtering by:

  • Category & Basic: category, gender, country, targetAudience, name
  • Social Data: minFollowers, maxFollowers, primaryPlatform, isVerified, contentNiche, minEngagementRate, maxEngagementRate
  • Physical Data: minAge, maxAge, minHeight, maxHeight, minWeight, maxWeight, bodyType
  • Visual Data: hairColor, hairLength, hairType, eyeColor, skinColor, ethnicity, faceShape, facialHair, wearGlasses, hasTattoos, hasPiercings
  • Voice Data: voicePitch, voiceTone, accent, primaryLanguage
  • Asset Filters: hasVoiceSample, hasImageAsset

Function: searchLikeness(filters, options)

Location: packages/likeness-search/src/searchLikeness.ts

Search Implementation

Function: searchByFilters(filters, limit, offset)

Location: packages/db/src/access/likenessSearch.ts

Without Asset Filters

When hasVoiceSample and hasImageAsset are not specified:

  • Uses Drizzle query builder with dynamic WHERE clause
  • Filters on enum fields using exact matches
  • Filters on numeric fields using range queries (>=, <=)
  • Filters on boolean fields using exact matches
  • Filters on name using ILIKE pattern matching
  • Ordered by numberOfFollowers desc by default

With Asset Filters

When hasVoiceSample or hasImageAsset are specified:

  • Uses raw SQL with parameterized queries for JOINs
  • JOINs to likeness_assets table for image assets
  • JOINs to likeness_assets table for voice assets (with voiceSampleUrl IS NOT NULL check)
  • JOINs to ip_terms table to verify VoiceOver terms are approved
  • JOINs to account table for name filter and slug validation
  • Special case: When name filter is present, other voice characteristic filters are ignored (name search takes priority)

Ordering

Results can be ordered by:

  • numberOfFollowers - Total follower count (descending by default)
  • birthYear - Age-based ordering
  • engagementRate - Engagement rate ordering

Direction: asc or desc (default: desc)

Function: vectorFallbackSearch(filters, options)

Location: packages/likeness-search/src/vectorFallbackSearch.ts

When Used

Vector search is used as fallback when:

  • SQL search returns 0 results
  • AND offset === 0 (only on first page, not for pagination)

Exception: Name filter searches never fall back to vector search (return empty instead).

Process

  1. Convert Filters to Text: filtersToDescription(filters) creates natural language description
    • Example: "category: models, gender: female, hairColor: blonde, minFollowers: 100000"
  2. Generate Query Embedding: generateEmbedding(queryText) uses OpenAI text-embedding-3-small
    • Returns 1536-dimensional vector
  3. Choose Search Method: Based on asset filters:
    • Both hasImageAsset and hasVoiceSample → Intersect results from both searches
    • Only hasVoiceSamplevectorSearchWithVoiceAssets()
    • Only hasImageAssetvectorSearchWithImageAssets()
    • Neither → vectorSearch()
  4. Vector Similarity Search: Uses pgvector cosine distance (<=> operator)
    • Max distance threshold: 0.8 (configurable)
    • Returns top K results ordered by similarity
  5. Fetch Search Data: Gets corresponding likenessSearch rows for vector results
  6. Format Results: Enriches with account and asset data

Vector Search Functions

Location: packages/db/src/access/likenessSearchVector.ts

  • vectorSearch(queryEmbedding, topK, maxDistance) - Basic vector search
  • vectorSearchWithImageAssets(queryEmbedding, opts) - Filters accounts with image assets
  • vectorSearchWithVoiceAssets(queryEmbedding, opts) - Filters accounts with voice samples (requires approved VoiceOver term)

Distance Calculation

PostgreSQL pgvector uses cosine distance:

  • Distance: embedding <=> queryEmbedding (lower = more similar)
  • Similarity Score: 1 - distance (higher = more similar)
  • Max Distance: 0.8 means similarity score ≥ 0.2

Result Formatting

Function: formatSearchResults(likenessSearchResults)

Location: packages/likeness-search/src/formatSearchResults.ts

Enrichment Process

  1. Fetch Account Data: Gets account records for all result accountIds
  2. Fetch Assets: Gets likenessAssets for all result accountIds
  3. Format Each Result:
    • Account info: displayName, imageUrl, slug
    • Profile image: account.imageUrl or fallback /img/no-img.svg
    • Images array: All IMAGE type assets with contentUrl
    • Voice sample URL: First VOICE asset with voiceSampleUrl
    • Follower count: Formatted string (e.g., "1.2M", "500K")
    • Score: Similarity score (only for vector search results)

Result Type

Type: FormattedSearchResult (from packages/types/src/types/LikenessSearch.ts)

{
  accountId: string;
  likenessSearch: LikenessSearch;
  account?: {
    displayName: string;
    imageUrl: string | null;
    slug: string;
  };
  id: string;
  displayName: string;
  profileImage: string;
  images: string[];
  followersCount: string;
  slug: string;
  score?: number;        // Only for vector search
  voiceSampleUrl?: string;
}

Search Flow Diagram

sequenceDiagram participant Buyer participant API as Search API participant Search as Search Function participant SQL as SQL Search participant Vector as Vector Search participant Format as Format Results participant DB as Database Buyer->>API: POST /api/indexing/search API->>Search: searchLikeness(filters, options) Search->>SQL: searchByFilters(filters, limit, offset) SQL->>DB: Query likeness_search table DB-->>SQL: Results alt Results Found SQL-->>Search: Return Results Search->>Format: formatSearchResults(results) Format->>DB: Fetch accounts and assets DB-->>Format: Account and asset data Format-->>Search: Formatted Results Search-->>API: Return Results else No Results + First Page SQL-->>Search: Empty Results Search->>Vector: vectorFallbackSearch(filters, options) Vector->>Vector: Generate Query Embedding Vector->>DB: Vector Similarity Search DB-->>Vector: Vector Results Vector->>DB: Fetch likenessSearch data DB-->>Vector: Search Data Vector->>Format: formatSearchResults(vectorResults) Format->>DB: Fetch accounts and assets DB-->>Format: Account and asset data Format-->>Vector: Formatted Results Vector-->>Search: Return Results Search-->>API: Return Results end API-->>Buyer: JSON Response

Filter Processing

Name Query Processing

When a GET request includes ?name_query=<query>:

  • Used for direct name search only - searches for accounts matching the name pattern
  • Converts to filter: filters.name = query
  • Uses ILIKE pattern matching on account.displayName (case-insensitive partial match)
  • Note: This is NOT a general query parameter - it only searches by name

AI-Based Filter Extraction (Client-Side)

For advanced search with natural language queries:

  1. Client-side processing: The client calls /api/generate-structured-data with a natural language description
  2. AI extraction: Uses Google Gemini (gemini-2.5-flash-lite) to extract structured filters from the text
  3. Filter application: The extracted filters are sent to the POST /api/indexing/search endpoint as pre-processed SearchFilters

Example Flow:

User Input: "Looking for a tall blonde female model in the US with a warm British accent"

Client calls: POST /api/generate-structured-data

Gemini extracts: { category: "MODELS", gender: "FEMALE", hairColor: "BLONDE", country: "USA", voiceTone: "WARM", accent: "BRITISH_RP" }

Client sends: POST /api/indexing/search { filters: { ...extracted filters } }

Search executes with structured filters

Filter Validation

Filters are validated against enum values:

  • Invalid enum values are ignored (no error thrown)
  • Numeric ranges are validated (min < max)
  • Boolean filters are converted to proper types

Name Filter Special Handling

When name filter is present:

  • Uses ILIKE pattern matching on account.displayName
  • For voice searches, other voice characteristic filters are ignored
  • Name search takes priority for exact matching

Performance Optimizations

SQL Search Optimizations

  • Indexes: Queue table has indexes on (status, createdAt) and (accountId, status)
  • Unique Constraint: Search index has unique constraint on accountId for fast lookups
  • Pagination: Limit/offset support prevents loading all results

Vector Search Optimizations

  • Index: pgvector automatically creates indexes for similarity search
  • Top K: Limits results to top K most similar (default: 20)
  • Distance Threshold: Max distance of 0.8 filters out very dissimilar results
  • Only on First Page: Vector search only runs when offset === 0 to avoid expensive operations on pagination

Result Formatting Optimizations

  • Batch Fetching: Accounts and assets fetched in parallel using Promise.all()
  • Map Lookups: Uses Map for O(1) lookups when matching results to accounts/assets
  • Single Query: All accounts fetched in one query, all assets in another

Search Examples

Request:

{
  "filters": {
    "category": "MODELS",
    "gender": "FEMALE",
    "hairColor": "BLONDE",
    "minFollowers": 100000
  },
  "limit": 20
}

Process:

  1. SQL search filters by exact enum matches
  2. Returns accounts matching all criteria
  3. Results ordered by numberOfFollowers desc

Example 2: Fuzzy Search with Vector Fallback

Request:

{
  "filters": {
    "category": "ACTORS",
    "hairColor": "BROWN",
    "eyeColor": "GREEN"
  },
  "limit": 20
}

Process:

  1. SQL search finds no exact matches
  2. Vector fallback generates embedding for query
  3. Vector search finds semantically similar accounts
  4. Results include similarity scores

Request:

{
  "filters": {
    "hasVoiceSample": true,
    "accent": "BRITISH_RP",
    "voiceTone": "WARM"
  },
  "limit": 10
}

Process:

  1. SQL search JOINs to likeness_assets and ip_terms
  2. Filters for accounts with voice samples and approved VoiceOver terms
  3. Filters by accent and voice tone
  4. Returns matching voice talent

Search Result Quality

SQL Search Quality

  • Precision: High - exact enum matches ensure precise results
  • Recall: Medium - may miss accounts with similar but not exact characteristics
  • Speed: Very Fast - indexed enum fields enable fast queries

Vector Search Quality

  • Precision: Medium - semantic similarity may include less relevant results
  • Recall: High - finds accounts even with incomplete or imprecise queries
  • Speed: Fast - pgvector indexes enable efficient similarity search

Combined Strategy

  • Best of Both: SQL for precision, vector for recall
  • Fallback Only: Vector search only used when SQL returns nothing
  • Quality Maintained: Both methods use same data source, ensuring consistency