arc / html
Ariadne Component Library: html writer and parser Component
Installs: 287
Dependents: 1
Suggesters: 0
Security: 0
Stars: 0
Watchers: 2
Forks: 3
Open Issues: 0
pkg:composer/arc/html
Requires (Dev)
- phpunit/phpunit: 9.*
README
arc/html
This component provides a unified html parser and writer. The writer allows for readable and correct html in code, not using templates. The parser is a wrapper around both DOMDocument and SimpleXML.
The parser and writer also work on fragments of HTML. The parser also makes sure that the output is identical to the input. When converting a node to a string, \arc\html will return the full html string, including tags. If you don't want that, you can always access the 'nodeValue' property to get the original SimpleXMLElement.
Finally the parser also adds the ability to use basic CSS selectors to find elements in the HTML.
use \arc\html as h; $htmlString = h::doctype() .h::html( h::head( h::title('Example site') ), h::body( ['class' => 'homepage'], h::h1('An example site') ) );
$html = \arc\html::parse($htmlString); $title = $html->head->title->nodeValue; // SimpleXMLElement 'Example site' $titleTag = $html->head->title; // <title>Example site</title>
CSS selectors
$title = current($html->find('title'));
The find() method always returns an array, which may be empty. By using current() you get the first element found, or null if nothing was found.
The following CSS selectors are supported:
- tag1 tag2
 This matches- tag2which is a descendant of- tag1.
- tag1 > tag2
 This matches- tag2which is a direct child of- tag1.
- tag:first-child
 This matches- tagonly if its the first child.
- tag1 + tag2
 This matches- tag2only if its immediately preceded by- tag1.
- tag1 ~ tag2
 This matches- tag2only if it has a previous sibling tag1.
- tag[attr]
 This matches- tagif it has the attribute- attr.
- tag[attr="foo"]
 This matches- tagif it has the attribute- attrwith the value- fooin its value list.
- tag#id
 This matches any- tagwith id- id.
- #id
 This matches any element with id- id.
- tag.class-name
 Matches any- tagwith a class- class-name.
- .class-name
 Matches any element with a class- class-name.
SimpleXML
The parsed HTML behaves almost identical to a SimpleXMLElement, with the exceptions noted above. So you can access attributes just like SimpleXMLElement allows:
$class = $html->html->body['class']; $class = $html->html->body->attributes('version');
You can walk through the node tree:
$title = $html->html->head->title;
Any method or property available in SimpleXMLElement is included in \arc\html parsed data.
DOMElement
In addition to SimpleXMLElement methods, you can also call any method and most properties available in DOMElement.
$class = $html->html->body->getAttributes('class'); $title = current($html->getElementsByTagName('title'));
Parsing fragments
The arc\html parser also accepts partial HTML content. It doesn't require a single root element.
$htmlString = <<< EOF <li> <a href="anitem/">An item</a> </li> <li> <a href="anotheritem/">Another item</a> </li> EOF; $html = \arc\html::parse($htmlString); $links = $html->find('a');
And when you convert the html back to a string, it will still be a partial HTML fragment.
If you parse a single HTML tag, other than <html>, you must still reference this element to access it:
$htmlString = <<< EOF <ul> <li> <a href="anitem/">An item</a> </li> <li> <a href="anotheritem/">Another item</a> </li> </ul> EOF; $html = \arc\html::parse($htmlString); $ul = $html->ul;
Why use this instead of DOMDocument or SimpleXML?
arc\html::parse has the following differences:
- When converted to string, it returns the original HTML, without additions you didn't make.
- You can use it with partial HTML fragments.
- No need to remember calling importNode() before appendChild() or insertBefore()
- No need to switch between SimpleXML and DOMDocument, because you need that one method only available in the other API.
- When returning a list of elements, you always get a simple Array, not a magic NodeList.
In addition arc\html doubles as a simple way to generate valid and indented HTML, with readable and self-validating code.