ilmlv/proxy-scraper

Library for scraping free proxy lists and validating proxy capabilities

Maintainers

Package info

github.com/IlmLV/proxy-scraper

pkg:composer/ilmlv/proxy-scraper

Statistics

Installs: 8

Dependents: 0

Suggesters: 0

Stars: 1

Open Issues: 0

v0.1-beta.6 2026-06-06 19:01 UTC

README

Proxy Scraper

CI Latest version Total downloads PHP version License

Proxy Scraper and Validator

Scrape free proxy lists from 39 live, tested sources and individually validate proxy capabilities — anonymity level, latency, HTTP(S) method support and more. Supports http / https / socks4 / socks5 proxies.

Warning

Free public proxies are highly not recommended for sensitive data transfer.

Why this library

  • Batteries included — 39 proxy sources work out of the box, each parser unit-tested and probed live against its real endpoint in CI; add your own in a few lines.
  • Cron-aware polling — each source declares a SCHEDULE; scheduled() only hits providers that are due.
  • Deep validation — anonymity (elite / anonymous / exposed), IP country & organisation, per-method latency, request-header leakage and multi-domain reachability.
  • Resilient — a failing source never aborts the batch; its exception is captured and exposed via errors().
  • Modern & type-safe — PHP 8.1+, declare(strict_types=1), PSR-4, and a fully mockable Symfony HTTP client.

Contents

Installation

Recommended installation method is via composer:

composer require ilmlv/proxy-scraper

Usage

The snippets below assume Composer's autoloader is already loaded (require __DIR__ . '/vendor/autoload.php';). dump() comes from symfony/var-dumper — swap it for print_r()/var_dump() if you prefer.

Scrape all sources

use IlmLV\ProxyScraper\LoadProxies;

$proxies = LoadProxies::init()->all();

foreach ($proxies->get() as $proxy) {
    echo $proxy . PHP_EOL;
}

dump($proxies->stats());

Scrape a single source

use IlmLV\ProxyScraper\LoadProxies;
use IlmLV\ProxyScraper\Sources\FreeProxyListNet;

$proxies = LoadProxies::init()->only(FreeProxyListNet::class);

foreach ($proxies->get() as $proxy) {
    echo $proxy . PHP_EOL;
}

Scrape only sources that are due

Each source declares a cron SCHEDULE; scheduled() runs just the ones due right now — handy for a frequently-polling cron job that should not hammer every provider on every tick.

use IlmLV\ProxyScraper\LoadProxies;

$proxies = LoadProxies::init()->scheduled();

foreach ($proxies->get() as $proxy) {
    echo $proxy . PHP_EOL;
}

Inspect per-source statistics

use IlmLV\ProxyScraper\LoadProxies;

$proxies = LoadProxies::init()->all();

foreach ($proxies->stats() as $source => $results) {
    echo $source . ' => ' . json_encode($results) . PHP_EOL;
}

Configure a source

Extra config keys are appended to the source URL as query parameters, so you can tune sources that accept them (e.g. pubproxy.com). You can also supply your own Symfony HTTP client.

use IlmLV\ProxyScraper\LoadProxies;
use IlmLV\ProxyScraper\Sources\PubProxyCom;
use Symfony\Component\HttpClient\HttpClient;

$scraperConfig = [
    PubProxyCom::class => [
        // 'api'  => 'xxx',
        'country' => 'US',
        'https'   => true,
        'level'   => 'elite',
    ],
];

$httpClient = HttpClient::create(['timeout' => 30]);

$proxies = LoadProxies::init($scraperConfig, $httpClient)
    ->only(PubProxyCom::class);

dump($proxies->stats());

Handle scraper errors

A source that fails (network error, bad response, misconfiguration) does not throw — the exception is captured and exposed via errors(), keyed by the source class.

use IlmLV\ProxyScraper\LoadProxies;
use IlmLV\ProxyScraper\Sources\PubProxyCom;

$scraperConfig = [
    PubProxyCom::class => [
        'api'            => 'wrong_key',
        'level'          => 'wrong_level',
        'wrongParameter' => 'wrong_value',
    ],
];

$proxies = LoadProxies::init($scraperConfig)->only(PubProxyCom::class);

foreach ($proxies->errors() as $scraper => $exception) {
    echo $scraper . ' => ' . $exception->getMessage() . PHP_EOL;
}

Proxy scraper sources

Currently implemented proxy sources:

Feel free to request more sources.

Proxy scrapers

Keep in mind that there is prepared multiple types of scraping libraries that can be used to simplify creation of your own source scrapers. Currently supported source data types:

Define a custom source

Extend one of the scraper base types, point $url at the resource, and hand the class to only()/add() — there is no need to register it in LoadProxies:

use IlmLV\ProxyScraper\LoadProxies;
use IlmLV\ProxyScraper\ScraperInterface;
use IlmLV\ProxyScraper\Scrapers\JsonScraper;

class CustomGimmeProxy extends JsonScraper implements ScraperInterface
{
    protected string $url = 'https://gimmeproxy.com/api/getProxy';
}

$proxies = LoadProxies::init()->only(CustomGimmeProxy::class);

foreach ($proxies->get() as $proxy) {
    echo $proxy . PHP_EOL;
}

Proxy validation

This library can also be used for proxy capability validation:

  • anonymity level:
    • elite (no origin IP exposure and no proxy relates headers),
    • anonymous (has proxy related headers),
    • exposed (has origin IP exposure)
  • if proxy server IP matches server by whom request is performed
  • HTTPS request support
  • various request methods: GET, POST, PUT, OPTIONS, HEAD, DELETE, PATCH
  • huge amount of request headers if they are not modified by proxy - tested in each request method
  • multiple public domains (amazon.com, craigslist.org, example.com, google.com, ss.com)
  • average latency calculation

Validate a proxy

ProxyValidation accepts either a proxy string or a Proxy entity:

use IlmLV\ProxyScraper\Validations\ProxyValidation;

$validation = new ProxyValidation('http://1.1.1.1:80');

dump($validation);

Scrape and validate

Validate every proxy a source returns:

use IlmLV\ProxyScraper\LoadProxies;
use IlmLV\ProxyScraper\Sources\FreeProxyListNet;
use IlmLV\ProxyScraper\Validations\ProxyValidation;

$proxies = LoadProxies::init()->only(FreeProxyListNet::class);

foreach ($proxies->get() as $proxy) {
    $validation = new ProxyValidation($proxy);

    dump(json_decode(json_encode($validation)));
}

The validation result looks like:

{
  "valid": true,
  "anonymityLevel": "elite",
  "ip": {
    "valid": true,
    "countryIsoCode": "NL",
    "organisation": "NForce Entertainment B.V."
  },
  "http": {
    "latency": 0.54314708709717,
    "get": {
      "valid": true,
      "latency": 0.19053816795349,
      "headers": {
        "A-IM": true,
        "Accept": true,
        "Accept-Charset": true,
        "Accept-Encoding": true,
        "Accept-Language": true,
        "Accept-Datetime": true,
        "Access-Control-Request-Method": true,
        "Access-Control-Request-Headers": true,
        "Authorization": true,
        "Cache-Control": true,
        "Connection": true,
        "Cookie": true,
        "Date": true,
        "Expect": true,
        "Forwarded": true,
        "From": true,
        "If-Modified-Since": true,
        "If-None-Match": true,
        "If-Range": true,
        "Max-Forwards": true,
        "Origin": true,
        "Pragma": true,
        "Range": true,
        "Referer": true,
        "TE": true,
        "User-Agent": true,
        "Upgrade": true,
        "Via": true,
        "Warning": true,
        "DNT": true,
        "X-Requested-With": true,
        "X-CSRF-Token": true,
        "X-Real-Ip": true,
        "X-Proxy-Id": true,
        "X-Forwarded": true,
        "X-Forwarded-For": true,
        "Forwarded-For": true,
        "Forwarded-For-Ip": true,
        "Client-Ip": true,
        "X-Client-Ip": true
      }
    },
    "post": {
      "valid": false,
      "latency": null,
      "error": {
        "message": "Connection to proxy closed for \"http://whoami.serviss.it/?format=json\".",
        "file": "/proxy-scraper/vendor/symfony/http-client/Chunk/ErrorChunk.php",
        "line": "56"
      },
      "headers": {}
    },
    "put": {
      "valid": true,
      "latency": 2.1179740428925,
      "headers": {...}
    },
    "options": {
      "valid": true,
      "latency": 1.0257298946381,
      "headers": {...}
    },
    "head": {
      "valid": true,
      "latency": 1.9323780536652,
      "headers": {...}
    },
    "delete": {
      "valid": true,
      "latency": 0.52144622802734,
      "headers": {...}
    },
    "patch": {
      "valid": true,
      "latency": 0.42012906074524,
      "headers": {...}
    }
  },
  "https": {
    "latency": 0.54314708709717,
    "get": {
      "valid": true,
      "latency": 0.19053816795349,
      "headers": {...}
    },
    "post": {
      "valid": false,
      "latency": null,
      "error": {
        "message": "Connection to proxy closed for \"https://whoami.serviss.it/?format=json\".",
        "file": "/proxy-scraper/vendor/symfony/http-client/Chunk/ErrorChunk.php",
        "line": "56"
      },
      "headers": []
    },
    "put": {
      "valid": true,
      "latency": 2.1179740428925,
      "headers": {...}
    },
    "options": {
      "valid": true,
      "latency": 1.0257298946381,
      "headers": {...}
    },
    "head": {
      "valid": true,
      "latency": 1.9323780536652,
      "headers": {...}
    },
    "delete": {
      "valid": true,
      "latency": 0.52144622802734,
      "headers": {...}
    },
    "patch": {
      "valid": true,
      "latency": 0.42012906074524,
      "headers": {...}
    }
  },
  "domains": {
    "amazon.com": {
      "valid": true,
      "latency": 1.7253589630127
    },
    "craigslist.org": {
      "valid": true,
      "latency": 4.507395029068
    },
    "example.com": {
      "valid": true,
      "latency": 0.4618821144104
    },
    "google.com": {
      "valid": false,
      "latency": 0.41366505622864
    },
    "ss.com": {
      "valid": true,
      "latency": 0.44051098823547
    }
  },
  "validatedAt": {
    "date": "2022-12-12 23:09:03.938495",
    "timezone_type": 3,
    "timezone": "Europe/Riga"
  }
}

Testing

The library ships with a PHPUnit test suite split into two suites:

  • unit — fully offline and deterministic. Every HTTP call is mocked with Symfony's MockHttpClient, so the entities, scraper base classes, all proxy sources and the validation subsystem are tested without touching the network.
  • live — hits the real provider endpoints and asserts each source is still reachable and returns proxies. It doubles as a dead-provider monitor and is kept out of the gating pipeline.
composer install

composer test            # unit suite (offline)
composer test:coverage   # unit suite + text coverage report (needs pcov or xdebug)
composer test:live       # live suite (requires network)
composer analyse         # PHPStan static analysis (level 10, offline)

Continuous integration runs on GitHub Actions (see .github/workflows/ci.yml): the unit suite runs across PHP 8.1–8.5 with code-coverage reporting, PHPStan static analysis runs as a separate gating job, and the live suite runs as a separate, non-blocking job on a weekly schedule (and on demand) so dead or drifting providers are caught early.

TODO:

  • Add capability to add custom domain validations
  • Reduce dependencies
  • Improve documentation
  • Tighten argument strict conditions
  • Add more proxy sources
  • Create functional tests
  • Monitor test coverage
  • Expand PHP compatibility (CI now covers PHP 8.1–8.5)