Search Flow

Detailed search process and algorithms

Overview

The search system provides a dual-index approach: fast SQL filtering for exact matches, with semantic vector search as a fallback for fuzzy queries. This ensures buyers can find relevant talent even with incomplete or imprecise search criteria.

Search Architecture

flowchart TD A[Buyer Submits Query] --> B{Query Format} B -->|Brief Text| C[AI Extract Filters] B -->|Direct Filters| D[Use Filters Directly] C --> D D --> E[SQL Search] E --> F{Results Found?} F -->|Yes| G[Format Results] F -->|No and First Page| H[Vector Fallback Search] H --> I[Generate Query Embedding] I --> J[Vector Similarity Search] J --> K[Format Results] G --> L[Return to Buyer] K --> L

Search API

Endpoint

Location: apps/zooly-app/app/api/indexing/search/route.ts

GET /api/indexing/search?name_query=<query>&limit=<number>&offset=<number>

POST /api/indexing/search

{
  "filters": {
    "category": "MODELS",
    "gender": "FEMALE",
    "hairColor": "BLONDE",
    "minFollowers": 100000
  },
  "limit": 20,
  "offset": 0,
  "orderBy": "numberOfFollowers",
  "orderDirection": "desc"
}

Search Filters

Type: SearchFilters (from packages/types/src/types/LikenessSearch.ts)

Supports filtering by:

Category & Basic: category, gender, country, targetAudience, name
Social Data: minFollowers, maxFollowers, primaryPlatform, isVerified, contentNiche, minEngagementRate, maxEngagementRate
Physical Data: minAge, maxAge, minHeight, maxHeight, minWeight, maxWeight, bodyType
Visual Data: hairColor, hairLength, hairType, eyeColor, skinColor, ethnicity, faceShape, facialHair, wearGlasses, hasTattoos, hasPiercings
Voice Data: voicePitch, voiceTone, accent, primaryLanguage
Asset Filters: hasVoiceSample, hasImageAsset

SQL Search

Function: searchLikeness(filters, options)

Location: packages/likeness-search/src/searchLikeness.ts

Search Implementation

Function: searchByFilters(filters, limit, offset)

Location: packages/db/src/access/likenessSearch.ts

Without Asset Filters

When hasVoiceSample and hasImageAsset are not specified:

Uses Drizzle query builder with dynamic WHERE clause
Filters on enum fields using exact matches
Filters on numeric fields using range queries (>=, <=)
Filters on boolean fields using exact matches
Filters on name using ILIKE pattern matching
Ordered by numberOfFollowers desc by default

With Asset Filters

When hasVoiceSample or hasImageAsset are specified:

Uses raw SQL with parameterized queries for JOINs
JOINs to likeness_assets table for image assets
JOINs to likeness_assets table for voice assets (with voiceSampleUrl IS NOT NULL check)
JOINs to ip_terms table to verify VoiceOver terms are approved
JOINs to account table for name filter and slug validation
Special case: When name filter is present, other voice characteristic filters are ignored (name search takes priority)

Ordering

Results can be ordered by:

numberOfFollowers - Total follower count (descending by default)
birthYear - Age-based ordering
engagementRate - Engagement rate ordering

Direction: asc or desc (default: desc)

Vector Fallback Search

Function: vectorFallbackSearch(filters, options)

Location: packages/likeness-search/src/vectorFallbackSearch.ts

When Used

Vector search is used as fallback when:

SQL search returns 0 results
AND offset === 0 (only on first page, not for pagination)

Exception: Name filter searches never fall back to vector search (return empty instead).

Process

Convert Filters to Text: filtersToDescription(filters) creates natural language description
- Example: "category: models, gender: female, hairColor: blonde, minFollowers: 100000"
Generate Query Embedding: generateEmbedding(queryText) uses OpenAI text-embedding-3-small
- Returns 1536-dimensional vector
Choose Search Method: Based on asset filters:
- Both hasImageAsset and hasVoiceSample → Intersect results from both searches
- Only hasVoiceSample → vectorSearchWithVoiceAssets()
- Only hasImageAsset → vectorSearchWithImageAssets()
- Neither → vectorSearch()
Vector Similarity Search: Uses pgvector cosine distance (<=> operator)
- Max distance threshold: 0.8 (configurable)
- Returns top K results ordered by similarity
Fetch Search Data: Gets corresponding likenessSearch rows for vector results
Format Results: Enriches with account and asset data

Vector Search Functions

Location: packages/db/src/access/likenessSearchVector.ts

vectorSearch(queryEmbedding, topK, maxDistance) - Basic vector search
vectorSearchWithImageAssets(queryEmbedding, opts) - Filters accounts with image assets
vectorSearchWithVoiceAssets(queryEmbedding, opts) - Filters accounts with voice samples (requires approved VoiceOver term)

Distance Calculation

PostgreSQL pgvector uses cosine distance:

Distance: embedding <=> queryEmbedding (lower = more similar)
Similarity Score: 1 - distance (higher = more similar)
Max Distance: 0.8 means similarity score ≥ 0.2

Result Formatting

Function: formatSearchResults(likenessSearchResults)

Location: packages/likeness-search/src/formatSearchResults.ts

Enrichment Process

Fetch Account Data: Gets account records for all result accountIds
Fetch Assets: Gets likenessAssets for all result accountIds
Format Each Result:
- Account info: displayName, imageUrl, slug
- Profile image: account.imageUrl or fallback /img/no-img.svg
- Images array: All IMAGE type assets with contentUrl
- Voice sample URL: First VOICE asset with voiceSampleUrl
- Follower count: Formatted string (e.g., "1.2M", "500K")
- Score: Similarity score (only for vector search results)

Result Type

Type: FormattedSearchResult (from packages/types/src/types/LikenessSearch.ts)

{
  accountId: string;
  likenessSearch: LikenessSearch;
  account?: {
    displayName: string;
    imageUrl: string | null;
    slug: string;
  };
  id: string;
  displayName: string;
  profileImage: string;
  images: string[];
  followersCount: string;
  slug: string;
  score?: number;        // Only for vector search
  voiceSampleUrl?: string;
}

Search Flow Diagram

sequenceDiagram participant Buyer participant API as Search API participant Search as Search Function participant SQL as SQL Search participant Vector as Vector Search participant Format as Format Results participant DB as Database Buyer->>API: POST /api/indexing/search API->>Search: searchLikeness(filters, options) Search->>SQL: searchByFilters(filters, limit, offset) SQL->>DB: Query likeness_search table DB-->>SQL: Results alt Results Found SQL-->>Search: Return Results Search->>Format: formatSearchResults(results) Format->>DB: Fetch accounts and assets DB-->>Format: Account and asset data Format-->>Search: Formatted Results Search-->>API: Return Results else No Results + First Page SQL-->>Search: Empty Results Search->>Vector: vectorFallbackSearch(filters, options) Vector->>Vector: Generate Query Embedding Vector->>DB: Vector Similarity Search DB-->>Vector: Vector Results Vector->>DB: Fetch likenessSearch data DB-->>Vector: Search Data Vector->>Format: formatSearchResults(vectorResults) Format->>DB: Fetch accounts and assets DB-->>Format: Account and asset data Format-->>Vector: Formatted Results Vector-->>Search: Return Results Search-->>API: Return Results end API-->>Buyer: JSON Response

Filter Processing

Name Query Processing

When a GET request includes ?name_query=<query>:

Used for direct name search only - searches for accounts matching the name pattern
Converts to filter: filters.name = query
Uses ILIKE pattern matching on account.displayName (case-insensitive partial match)
Note: This is NOT a general query parameter - it only searches by name

AI-Based Filter Extraction (Client-Side)

For advanced search with natural language queries:

Client-side processing: The client calls /api/generate-structured-data with a natural language description
AI extraction: Uses Google Gemini (gemini-2.5-flash-lite) to extract structured filters from the text
Filter application: The extracted filters are sent to the POST /api/indexing/search endpoint as pre-processed SearchFilters

Example Flow:

User Input: "Looking for a tall blonde female model in the US with a warm British accent"
  ↓
Client calls: POST /api/generate-structured-data
  ↓
Gemini extracts: { category: "MODELS", gender: "FEMALE", hairColor: "BLONDE", country: "USA", voiceTone: "WARM", accent: "BRITISH_RP" }
  ↓
Client sends: POST /api/indexing/search { filters: { ...extracted filters } }
  ↓
Search executes with structured filters

Filter Validation

Filters are validated against enum values:

Invalid enum values are ignored (no error thrown)
Numeric ranges are validated (min < max)
Boolean filters are converted to proper types

Name Filter Special Handling

When name filter is present:

Uses ILIKE pattern matching on account.displayName
For voice searches, other voice characteristic filters are ignored
Name search takes priority for exact matching

Performance Optimizations

SQL Search Optimizations

Indexes: Queue table has indexes on (status, createdAt) and (accountId, status)
Unique Constraint: Search index has unique constraint on accountId for fast lookups
Pagination: Limit/offset support prevents loading all results

Vector Search Optimizations

Index: pgvector automatically creates indexes for similarity search
Top K: Limits results to top K most similar (default: 20)
Distance Threshold: Max distance of 0.8 filters out very dissimilar results
Only on First Page: Vector search only runs when offset === 0 to avoid expensive operations on pagination

Result Formatting Optimizations

Batch Fetching: Accounts and assets fetched in parallel using Promise.all()
Map Lookups: Uses Map for O(1) lookups when matching results to accounts/assets
Single Query: All accounts fetched in one query, all assets in another

Search Examples

Example 1: Exact Filter Search

Request:

{
  "filters": {
    "category": "MODELS",
    "gender": "FEMALE",
    "hairColor": "BLONDE",
    "minFollowers": 100000
  },
  "limit": 20
}

Process:

SQL search filters by exact enum matches
Returns accounts matching all criteria
Results ordered by numberOfFollowers desc

Example 2: Fuzzy Search with Vector Fallback

Request:

{
  "filters": {
    "category": "ACTORS",
    "hairColor": "BROWN",
    "eyeColor": "GREEN"
  },
  "limit": 20
}

Process:

SQL search finds no exact matches
Vector fallback generates embedding for query
Vector search finds semantically similar accounts
Results include similarity scores

Example 3: Voice Search

Request:

{
  "filters": {
    "hasVoiceSample": true,
    "accent": "BRITISH_RP",
    "voiceTone": "WARM"
  },
  "limit": 10
}

Process:

SQL search JOINs to likeness_assets and ip_terms
Filters for accounts with voice samples and approved VoiceOver terms
Filters by accent and voice tone
Returns matching voice talent

Search Result Quality

SQL Search Quality

Precision: High - exact enum matches ensure precise results
Recall: Medium - may miss accounts with similar but not exact characteristics
Speed: Very Fast - indexed enum fields enable fast queries

Vector Search Quality

Precision: Medium - semantic similarity may include less relevant results
Recall: High - finds accounts even with incomplete or imprecise queries
Speed: Fast - pgvector indexes enable efficient similarity search

Combined Strategy

Best of Both: SQL for precision, vector for recall
Fallback Only: Vector search only used when SQL returns nothing
Quality Maintained: Both methods use same data source, ensuring consistency

Architecture & Design Indexing Pipeline