modules/html/parser.zzm

html-0.0.2 documentation

Package

Name
html
Version
0.0.2
Uploaded
2026-06-12 23:25:02
Repository
https://github.com/tobyink/zuzu-html
Dependencies
Metadata
zuzu-distribution.json
Archive
Download .tar.gz

NAME

html/parser - HTML5 parser entry points.

SYNOPSIS

  from html/parser import HTML, HTMLParser;

  let doc := HTML.parse("<!doctype html><title>Example</title>");
  let fragment := HTML.parse_fragment("<tr><td>x", context: "table");

  let parser := new HTMLParser();
  parser.parse("<p>Reusable</p>");
  let errors := parser.errors();

DESCRIPTION

This module is the main public entry point for the pure ZuzuScript HTML parser. It parses full documents and context-sensitive fragments into the DOM-like classes from html/dom.

The parser implements the document and fragment tree-building behaviour covered by the focused test suites and the claimed html5lib tree-construction support level: document setup, in-body recovery, active formatting reconstruction, adoption-agency recovery, forms, buttons, void elements, plaintext, table foster parenting, select recovery, template content, framesets, SVG/MathML namespaces, adjusted foreign names, foreign attributes, integration points, and foreign CDATA sections.

HTML parse errors are collected by default. Pass strict: true to throw after parsing if any parse errors were recorded. Strict mode does not change recovery behaviour; it only turns a non-empty parse-error list into an exception.

The scripting option defaults to false. It affects noscript tokenization and html5lib scripting-mode variants. It does not execute scripts or support script-driven DOM mutation during parsing.

HTML.load and HTML.dump are public methods that currently throw clear unimplemented errors. Use HTML.parse and toHTML with explicit std/io file handling in application code.

EXPORTS

Parser Facade

  • HTML

    Static parser facade.

    • HTML.parse(String html, ... options) -> HTMLDocument

      Parse a full HTML document and return an HTMLDocument. Options: strict and scripting.

    • HTML.parse_string(String html, ... options) -> HTMLDocument

      Alias for HTML.parse.

    • HTML.parse_fragment(String html, ... options) -> HTMLDocumentFragment

      Parse an HTML fragment and return an HTMLDocumentFragment. context defaults to div. It may be a tag-name string, the special strings svg or math, or an HTMLElement/HTMLTemplateElement, including elements created with createElementNS for SVG or MathML contexts. Options: context, strict, and scripting.

    • HTML.load(Path path, ... options) -> HTMLDocument

      Not implemented. This method currently throws html/parser: load is not implemented yet.

    • HTML.dump(Path path, HTMLDocument|HTMLNode value, Bool pretty?)

      Not implemented. This method currently throws html/parser: dump is not implemented yet.

Parser Class

  • HTMLParser

    Reusable parser object with instance methods parse, parse_string, parse_fragment, load, dump, document, errors, and parseErrors. document() returns the last parsed full document, or the staging document from the last fragment parse. errors() returns a copy of the most recent parse-error list. parseErrors() is an alias for errors().

Re-exported DOM Classes

  • HTMLDocument, HTMLDocumentFragment, HTMLElement,

    HTMLTemplateElement, HTMLNode, HTMLText, HTMLComment, HTMLDoctype

    DOM classes from html/dom. They are re-exported so code which imports html/parser can inspect or type-check parse results without a second import.

  • DOMNode, DOMDocument, DOMElement, DOMText,

    DOMComment

    DOM-compatible aliases from html/dom.

  • HTML_NAMESPACE_URI, SVG_NAMESPACE_URI, MATHML_NAMESPACE_URI,

    XLINK_NAMESPACE_URI, XML_NAMESPACE_URI, XMLNS_NAMESPACE_URI

    Namespace URI constants from html/dom.

Re-exported Tokenizer Classes

  • HTMLInputStream, HTMLTokenizer, HTMLToken,

    HTMLParseError, HTMLNamedCharacterReferences

    Tokenizer-layer classes re-exported from html/tokenizer. These are available for focused tokenizer tests and tree-builder integration.

Re-exported Tree-Builder Classes

  • HTMLTreeBuilder, HTMLTreeConstructionResult,

    HTMLTreeTestSerializer

    Tree-builder classes re-exported from html/treebuilder. Most users should prefer HTML or HTMLParser; these classes are available for diagnostics, tests, and tooling that needs direct access to the tree-construction layer.

COPYRIGHT AND LICENCE

html/parser is copyright Toby Inkster.

It is free software; you may redistribute it and/or modify it under the terms of either the Artistic License 1.0 or the GNU General Public License version 2.