NAME
html/parser - HTML5 parser entry points.
SYNOPSIS
from html/parser import HTML, HTMLParser;
let doc := HTML.parse("<!doctype html><title>Example</title>");
let fragment := HTML.parse_fragment("<tr><td>x", context: "table");
let parser := new HTMLParser();
parser.parse("<p>Reusable</p>");
let errors := parser.errors();
DESCRIPTION
This module is the main public entry point for the pure ZuzuScript HTML parser. It parses full documents and context-sensitive fragments into the DOM-like classes from html/dom.
The parser implements the document and fragment tree-building behaviour covered by the focused test suites and the claimed html5lib tree-construction support level: document setup, in-body recovery, active formatting reconstruction, adoption-agency recovery, forms, buttons, void elements, plaintext, table foster parenting, select recovery, template content, framesets, SVG/MathML namespaces, adjusted foreign names, foreign attributes, integration points, and foreign CDATA sections.
HTML parse errors are collected by default. Pass strict: true to throw after parsing if any parse errors were recorded. Strict mode does not change recovery behaviour; it only turns a non-empty parse-error list into an exception.
The scripting option defaults to false. It affects noscript tokenization and html5lib scripting-mode variants. It does not execute scripts or support script-driven DOM mutation during parsing.
HTML.load and HTML.dump are public methods that currently throw clear unimplemented errors. Use HTML.parse and toHTML with explicit std/io file handling in application code.
EXPORTS
Parser Facade
HTMLStatic parser facade.
HTML.parse(String html, ... options) -> HTMLDocumentParse a full HTML document and return an
HTMLDocument. Options:strictandscripting.HTML.parse_string(String html, ... options) -> HTMLDocumentAlias for
HTML.parse.HTML.parse_fragment(String html, ... options) -> HTMLDocumentFragmentParse an HTML fragment and return an
HTMLDocumentFragment.contextdefaults todiv. It may be a tag-name string, the special stringssvgormath, or anHTMLElement/HTMLTemplateElement, including elements created withcreateElementNSfor SVG or MathML contexts. Options:context,strict, andscripting.HTML.load(Path path, ... options) -> HTMLDocumentNot implemented. This method currently throws
html/parser: load is not implemented yet.HTML.dump(Path path, HTMLDocument|HTMLNode value, Bool pretty?)Not implemented. This method currently throws
html/parser: dump is not implemented yet.
Parser Class
HTMLParserReusable parser object with instance methods
parse,parse_string,parse_fragment,load,dump,document,errors, andparseErrors.document()returns the last parsed full document, or the staging document from the last fragment parse.errors()returns a copy of the most recent parse-error list.parseErrors()is an alias forerrors().
Re-exported DOM Classes
HTMLDocument,HTMLDocumentFragment,HTMLElement,HTMLTemplateElement,HTMLNode,HTMLText,HTMLComment,HTMLDoctypeDOM classes from
html/dom. They are re-exported so code which importshtml/parsercan inspect or type-check parse results without a second import.DOMNode,DOMDocument,DOMElement,DOMText,DOMCommentDOM-compatible aliases from
html/dom.HTML_NAMESPACE_URI,SVG_NAMESPACE_URI,MATHML_NAMESPACE_URI,XLINK_NAMESPACE_URI,XML_NAMESPACE_URI,XMLNS_NAMESPACE_URINamespace URI constants from
html/dom.
Re-exported Tokenizer Classes
HTMLInputStream,HTMLTokenizer,HTMLToken,HTMLParseError,HTMLNamedCharacterReferencesTokenizer-layer classes re-exported from
html/tokenizer. These are available for focused tokenizer tests and tree-builder integration.
Re-exported Tree-Builder Classes
HTMLTreeBuilder,HTMLTreeConstructionResult,HTMLTreeTestSerializerTree-builder classes re-exported from
html/treebuilder. Most users should preferHTMLorHTMLParser; these classes are available for diagnostics, tests, and tooling that needs direct access to the tree-construction layer.
COPYRIGHT AND LICENCE
html/parser is copyright Toby Inkster.
It is free software; you may redistribute it and/or modify it under the terms of either the Artistic License 1.0 or the GNU General Public License version 2.