modules/html/parser.zzm

NAME

html/parser - HTML5 parser entry points.

SYNOPSIS

  from html/parser import HTML, HTMLParser;

  let doc := HTML.parse("<!doctype html><title>Example</title>");
  let fragment := HTML.parse_fragment("<tr><td>x", context: "table");

  let parser := new HTMLParser();
  parser.parse("<p>Reusable</p>");
  let errors := parser.errors();

DESCRIPTION

This module is the main public entry point for the pure ZuzuScript HTML parser. It parses full documents and context-sensitive fragments into the DOM-like classes from html/dom.

The parser implements the document and fragment tree-building behaviour covered by the focused test suites and the claimed html5lib tree-construction support level: document setup, in-body recovery, active formatting reconstruction, adoption-agency recovery, forms, buttons, void elements, plaintext, table foster parenting, select recovery, template content, framesets, SVG/MathML namespaces, adjusted foreign names, foreign attributes, integration points, and foreign CDATA sections.

HTML parse errors are collected by default. Pass strict: true to throw after parsing if any parse errors were recorded. Strict mode does not change recovery behaviour; it only turns a non-empty parse-error list into an exception.

The scripting option defaults to false. It affects noscript tokenization and html5lib scripting-mode variants. It does not execute scripts or support script-driven DOM mutation during parsing.

HTML.load and HTML.dump are public methods that currently throw clear unimplemented errors. Use HTML.parse and toHTML with explicit std/io file handling in application code.

EXPORTS

Parser Facade

HTML
Static parser facade.
- HTML.parse(String html, ... options) -> HTMLDocument
  Parse a full HTML document and return an HTMLDocument. Options: strict and scripting.
- HTML.parse_string(String html, ... options) -> HTMLDocument
  Alias for HTML.parse.
- HTML.parse_fragment(String html, ... options) -> HTMLDocumentFragment
  Parse an HTML fragment and return an HTMLDocumentFragment. context defaults to div. It may be a tag-name string, the special strings svg or math, or an HTMLElement/HTMLTemplateElement, including elements created with createElementNS for SVG or MathML contexts. Options: context, strict, and scripting.
- HTML.load(Path path, ... options) -> HTMLDocument
  Not implemented. This method currently throws html/parser: load is not implemented yet.
- HTML.dump(Path path, HTMLDocument|HTMLNode value, Bool pretty?)
  Not implemented. This method currently throws html/parser: dump is not implemented yet.

Parser Class

HTMLParser
Reusable parser object with instance methods parse, parse_string, parse_fragment, load, dump, document, errors, and parseErrors. document() returns the last parsed full document, or the staging document from the last fragment parse. errors() returns a copy of the most recent parse-error list. parseErrors() is an alias for errors().

Re-exported DOM Classes

HTMLDocument, HTMLDocumentFragment, HTMLElement,
HTMLTemplateElement, HTMLNode, HTMLText, HTMLComment, HTMLDoctype

DOM classes from html/dom. They are re-exported so code which imports html/parser can inspect or type-check parse results without a second import.
DOMNode, DOMDocument, DOMElement, DOMText,
DOMComment

DOM-compatible aliases from html/dom.
HTML_NAMESPACE_URI, SVG_NAMESPACE_URI, MATHML_NAMESPACE_URI,
XLINK_NAMESPACE_URI, XML_NAMESPACE_URI, XMLNS_NAMESPACE_URI

Namespace URI constants from html/dom.

Re-exported Tokenizer Classes

HTMLInputStream, HTMLTokenizer, HTMLToken,
HTMLParseError, HTMLNamedCharacterReferences

Tokenizer-layer classes re-exported from html/tokenizer. These are available for focused tokenizer tests and tree-builder integration.

Re-exported Tree-Builder Classes

HTMLTreeBuilder, HTMLTreeConstructionResult,
HTMLTreeTestSerializer

Tree-builder classes re-exported from html/treebuilder. Most users should prefer HTML or HTMLParser; these classes are available for diagnostics, tests, and tooling that needs direct access to the tree-construction layer.

COPYRIGHT AND LICENCE

html/parser is copyright Toby Inkster.

It is free software; you may redistribute it and/or modify it under the terms of either the Artistic License 1.0 or the GNU General Public License version 2.