kosuha606 / html-uni-parser
Uni parser for sites
Installs: 50
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 2
Type:composer-pugin
pkg:composer/kosuha606/html-uni-parser
Requires
- php: >=7.0.0
- beberlei/assert: dev-master
- zendframework/zend-dom: ^2.7@dev
Requires (Dev)
- phpunit/phpunit: 6.5
- seregazhuk/php-watcher: dev-master
README
Universal html parser which can parse every kind of html page
Installation
To install this plugin use composer:
$ composer require kosuha606/html-uni-parser
Usage
There is four available types of parsing html.
Example:
$results = HtmlUniParser::create([ 'pageUrl' => 'http://example.com', 'xpathOnCard' => [ 'h1' => '//h1', 'description' => 'HTML//p' ] ])->parseCard();
Examples
For more examples see the examples/ direcotry
Description of configurable properties
| Property | Description | 
|---|---|
| catalogUrl | The url address for parsing by catalog strategy parseCatalog() | 
| searchUrl | The url what used to search on goal site. parseSearch() | 
| pageUrl | The url what used to parse one page. parseCard() | 
| urlGenerator | Callback function what can be used to generate links to parse parseGenerator() | 
| encoding | The encoding of goal site | 
| siteBaseUrl | Base url for process links after parse | 
| resultLimit | Here you can limit the results count | 
| sleepAfterRequest | Number of seconds to sleep after each request | 
| goIntoCard | Wheather need to go into card when parse catalog links | 
| xpathItem | Xpath query what can be used for parse items in list | 
| xpathLink | Xpath query what can be used for parse link inside parsed item | 
| xpathOnCard | Array of xpath queries, every key will be key in result array | 
| typeMech | Type of parsing mechanizm, for example: wget,curl,phantomjs,filegetcontents | 
| forceOuterHtml | Force parser to use outer html for xpaths | 
Available methods
| Method | Description | 
|---|---|
| parseCatalog | To parse catalog links and parse every link this function reutrn results as array of parsed links | 
| parseSearch | This method takes an argument of query string for search page and after building search link it behave like parseCatalog | 
| parseCard | To parse one page of site | 
| parseGenerator | To parse links what was generated by urlGenerator callback | 
Run tests
To run tests you can use this command:
./vendor/bin/phpunit