Social Scraper Package

Social media scraping functionality for multiple platforms

Overview

@zooly/social-scraper is a package that provides social media scraping capabilities across multiple platforms. It scrapes social media profiles to extract follower counts, avatars, and other profile information.

Package Details

  • Package Name: @zooly/social-scraper
  • Location: packages/social-scraper
  • Type: Social media scraping service

Key Features

  • Multi-Platform Support: Supports Instagram, TikTok, Twitter, YouTube, and LinkedIn
  • Profile Data Extraction: Extracts follower counts, avatars, and profile information
  • Rate Limit Handling: Built-in rate limit detection and retry logic
  • Exponential Backoff: Automatic backoff for failed scraping attempts
  • Scrape Management: Tracks scrape attempts and prevents duplicate scraping

Supported Platforms

  • Instagram - Profile scraping with follower counts and avatars
  • TikTok - Profile data extraction
  • Twitter - Profile information scraping
  • YouTube - Channel data extraction
  • LinkedIn - Professional profile scraping

Core Functionality

Main Operations

  • processSocialScraping - Main orchestration function that scrapes all social links for an account
  • parseSocialUrl - Parse and validate social media URLs
  • getScraperForPlatform - Get the appropriate scraper function for a platform

Scraping Features

  • Automatic retry with exponential backoff
  • Rate limit detection and handling
  • Skip recently scraped profiles
  • Maximum attempt limits (default: 5 attempts)
  • Best avatar selection from multiple sources
  • Follower count aggregation

Dependencies

  • @zooly/app-db - Database access layer
  • @zooly/types - Shared types
  • @zooly/util - Shared utilities
  • @zooly/util-srv - Server-side utilities
  • scrapfly-sdk - Scrapfly SDK for web scraping

Usage

This package is used to scrape social media profiles for accounts registered in the system. It processes all social links associated with an account and extracts relevant profile information, which is then stored in the database for use in likeness search and other features.

Scrape Management

The package includes intelligent scrape management:

  • Tracks scrape attempts and timestamps
  • Prevents duplicate scraping within a time window
  • Implements exponential backoff for retries
  • Handles rate limit errors gracefully
  • Aggregates results from multiple platforms