[manual index][section index]


xml - XML navigation


include "xml.m";

xml := load Xml Xml->PATH;
Parser, Item, Locator, Attributes, Mark: import xml;

init:   fn(): string;
open: fn(f: string, warning: chan of (Locator, string),
                preelem: string): (ref Parser, string);
fopen: fn(iob: ref Bufio->Iobuf, f: string, warning: chan of (Locator, string),
                preelem: string): (ref Parser, string);

Parser: adt {
    fileoffset: int;

    next:       fn(p: self ref Parser): ref Item;
    down:       fn(p: self ref Parser);
    up:         fn(p: self ref Parser);
    mark:       fn(p: self ref Parser): ref Mark;
    atmark:     fn(p: self ref Parser, m: ref Mark): int;
    goto:       fn(p: self ref Parser, m: ref Mark);
    str2mark:   fn(p: self ref Parser, s: string): ref Mark;

Item: adt {
    fileoffset: int;
    pick {
    Tag =>
        name:   string;
        attrs:  Attributes;
    Text =>
        ch:     string;
        ws1:	int;
		ws2:    int;
    Process =>
        target: string;
        data:   string;
    Doctype =>
        name:   string;
        public: int;
        params: list of string;
    Stylesheet =>
        attrs:  Attributes;
    Error =>
        loc:    Locator;
        msg:    string;

Locator: adt {
    line:       int;
    systemid:   string;
    publicid:   string;

Attribute: adt {
    name:       string;
    value:      string;

Attributes: adt {
    all:        fn(a: self Attributes): list of Attribute;
    get:        fn(a: self Attributes, name: string): string;

Mark: adt {
    offset:     int;
    str:        fn(m: self ref Mark): string;   


Xml provides an interface for navigating XML files (`documents'). Once loaded, the module must first be initialised by calling init. A new parser instance is created by calling open(fwarningpreelem), which opens the file f for parsing as an XML document, or fopen(iobnamewarningpreelem), which does the same for an already open Iobuf (the string name will be used in diagnostics). Both functions return a tuple (perr). If there is an error opening the document, p is nil, and err contains a description of the error; otherwise p can be used to examine the contents of the document. If warning is not nil, non-fatal errors encountered when parsing will be sent on this channel - a separate process will be needed to received them. Each error is represented by a tuple (locmsg), containing the location loc, and the description, msg, of the error encountered. One XML tag, preelem, may be marked for special treatment by the XML parser: within this tag all white space will be passed through as-is.

Once an XML document has been opened, the following Parser methods may be used to examine the items contained within:

An XML document is represented by a tree-structure. Next returns the next item in the document at the current level of the tree within the current parent element. If there are no more such items, it returns nil.
Down descends into the element that has just been returned by next, which should be a Tag item. Subsequent items returned by next will be those within that tag.
Up moves up one level in the XML tree.
Mark returns a mark that can be used to return later to the current position in the document. The underlying file must be seekable for this to work.
Goes back to a previously marked position, m, in the document.
Atmark returns non-zero if the current position in the document is the same as that marked by m. The current tree level is ignored in the comparison.
Str2mark turns a string as created by Mark.str back into a mark as returned by Parser.mark.

Various species of items live in XML documents; they are encapsulated in the Item adt. This contains one member in common to all its subtypes: fileoffset, the position in the XML document of the start of the item. The various kinds of item are as follows:

A generic XML tag. Name names the tag, and attrs holds its attributes, if any.
Text represents inline text in the XML document. With the exception of text inside the tag named by preelem in open, any runs of white space are compressed to a single space, and white space at the start or end of the text is elided. Ch contains the resulting text; ws1 and ws2 are non-zero if there was originally white space at the start or end of the text respectively.
Process represents an XML document processing directive. Target is the processing instruction's target, and data holds the rest of the text inside the directive. XML stylesheet directives are recognised directly and have their own item type.
Doctype should only occur at the start of an xml document, and represents the type of the XML document.
Stylesheet represents an XML stylesheet processing request. The data of the processing request is parsed as per the RFC into attribute-value pairs.
If an unrecoverable error occurs processing the document, an Error item is returned holding the location (loc), and description (msg) of the error. This will be the last item returned by the parser.

The attribute-value pairs in Tag and Stylesheet items are held in an Atttributes adt, say a. A.all() yields a list holding all the attributes; a.get(name) yields the value of the attribute name.

The location returned when an error is reported is held inside a Locator adt, which holds the line number on which the error occurred, the ``system id'' of the document (in this implementation, its file name), and the "public id" of the document (not currently used).

A Mark m may be converted to a string with m.str(); this enables marks to be written out to external storage, to index a large XML document, for example. Note that if the XML document changes, any stored marks will no longer be valid.




``Extensible Markup Language (XML) 1.0 (Second Edition)'', http://www.w3.org/TR/REC-xml

``Navigating Large XML Documents on Small Devices'' in Volume 2.


XML's definition makes it tricky to handle leading and trailing white space efficiently; ws1 and ws2 in Item.Text is the current compromise.

XML(2 ) Rev:  Tue Mar 31 02:42:39 GMT 2015