Parse tree

The parse-ebnf parse tree is represented using the PT class:

class parse_ebnf.PT

A parse tree for EBNF.

Contains the following variables:

  • root, the root node of the tree, an instance of Root;

  • count, the number of nodes in the tree;

  • height, the height of the tree;

  • maxDegree, the maximum number of children that a single node has in the tree.

And the following functions:

  • unparse, pass a write function(a function that works just like the write function for files) to write the text that the parse tree was created from.

  • write, use with a write function to dump a textual representation of the parse tree, mainly meant for debugging.

For parsing check the parsing section.

Each node in the parse tree inherits from the Node class:

class parse_ebnf.nodes.Node(startLine=0, startColumn=0, endLine=0, endColumn=0)

Base class of all PT nodes.

PT nodes differ only in what nodes are their parent and which nodes are their children, see the reference.

This node type, along with Leaf and Primary, are not instantiated in the PT, they serve as abstract base classes.

This class in particular is the base for all other node types. The intent is to use isinstance to resolve a node’s exact type.

All nodes contain the following variables:

  • parent – a Node instance that acts as the parent for this node. It is None only for Root.

  • children – a list of Node instances that act as this node’s children. Every node in this list has its parent set to this node. Empty only for instances of Leaf.

  • depth – an integer denoting how deep this node is in the tree. The root is defined as being at depth 0, it’s children at depth 1, their children at depth 2, etc.

  • startLine – the line where the text that this node is comprised of starts, inclusively. Counting starts from 1, and is incremented each time a newline is encountered in the input;

  • startColumn – the column where the text that this node is comprised of starts, inclusively. Counting starts at 1, and in incremented every character. Each newline resets the counter to 0;

  • endLine – like startLine except that this is where the text ends, exclusively;

  • endColumn – like startColumn except that this is where the text ends, exclusively.

The following statements are always true:

  • startLine >= 0

  • endLine >= 0

  • startColumn >= 0

  • startLine <= endLine

  • if startColumn > endColumn and startLine == endLine: str(node) == ‘’ else: endColumn >= 0

Note

The end coordinates also take into account child nodes. The children’s text is also counted as the parent’s text.

The Node class along with a few other node classes do not appear in the tree; they act as abstract base classes:

class parse_ebnf.nodes.Leaf(data='')

Base class of all leaf nodes.

This node is a base class for all leaf nodes, nodes whose children list is empty.

Note

Only leafs nodes contain text data in the tree.

This node has the following variables:

  • Variables inherited from Node;

  • data – the text content of the node, a string.

class parse_ebnf.nodes.Primary(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding what a term parses.

This class is mainly meant to tag other nodes that may be primaries for a Term.

The current list of primaries is:

Parse trees and all nodes implement the repr and str functions. Taking repr() of a node or parse tree gives a debug string that describes the structure of a parse tree or node. Taking str() of a node or parse tree gives the yield of said node or parse tree.

str() applied to a parse tree is equivalent to applying it to its root. On the other hand, repr() is not equivalent between a parse tree and its root, the parse tree adds an extra line.

Partial trees

If an error occurs during parsing, a partial tree is created. A partial tree is valid for the most part, with the exception of partial nodes. Partial nodes have valid parent and child links, however their coordinates, child specification and data are not guaranteed to be valid. This means that a partial parse tree as a whole is valid up to the point of failure, anything beyond that is undefined.

Partial trees and nodes operate under the following rules:

  • A partial tree has a partial root;

  • A partial node has a partial last child, all other children are valid;

See the example for how to get and handle partial trees.

Node types

This section contains a reference of all node types found in the PT.

Each node type contains:

  • A short description of what it represents;

  • A list of possible parent nodes;

  • A child node list specification.

The parent list and child node specification are taken directly from the tests. The child nodes specification uses functions whose meaning is described in the notation section.

The nodes are ordered in roughly the same way as they appear in the parse tree, going from the root to the leafs.

Notation

The types, order and count of child nodes for each node type is documented using a list of node types and functions called children. These functions enclose a node type and modify how many times, if it at all it appears. Some functions set what data a Leaf node may hold.

Child nodes appear in the same order as in the child node specification. For example say the node A had a child spec of:

children = [ Identifier, Space, Identifier ]

That means that any instance of A is guaranteed to have a child list of length 3 that first contains an Identifier as the first element, followed by a Space and finally another Identifier instance as the last element.

A number of functions modify the number of times a child node appears. For example the child specification of Product:

children = [ Identifier, maybe(Space), literal("="), maybe(Space), DefinitionList, maybe(Space), literal([";", "."]) ]

Means that any one Product instance may have between 4 and 7 child nodes. Of those, the first is always an Identifier. A single Space instance might follow, or it might not. Then a Literal with the data = always follows, with an optional Space. A single DefinitionList comes after and another optional Space follows. The last node is guaranteed to be a Literal that has its data set either to . or ;.

As another example, the child spec of Comment:

children = [ literal("(*"), zero_or_more(either(Comment, Text)), literal("*)") ]

All comments start with a Literal with the data (*, followed by any number, including zero, or either Comment or Text instances. Finally the last node is always a Literal with the data *). The length of the children list can be anywhere from 2 to any finite number.

As a final example, an empty child specification:

children = [ ]

This means that the node has no children, a unique characteristic of Leaf nodes.

The parents list gives the possible nodes types that may be a parent of a node. For example:

parents = [ Root, Product, Comment ]

Means that the node’s parent may either be a Root instance, a Product instance or a Comment instance.

Function reference

tests.tree_structure.literal(s)

A singe Literal instance is expected.

The data that the Literal holds is either one of the arguments.

tests.tree_structure.either(*args)

Only one of the arguments are expected.

The arguments may be node types or other functions. In any case, only one of them may follow in the child list.

tests.tree_structure.zero_or_more(*args)

Any finite number or zero of the arguments are expected.

Any number of node types and functions may be specified. In all cases, they are expected to follow in the child list in the order they are specified.

For example:

zero_or_more(either(Comment, Product), Space)

Means that either a Comment or Product followed by a Space are expected to occur any number of times in the child list.

tests.tree_structure.maybe(*args)

The arguments may or may not occur.

The arguments may be node types or other functions. In all cases, either all of the arguments occur in the child list in the specified order or they do not occur at all.

For example:

maybe(Space, zero_or_more(Term, Product), Space)

Means that a single sequence of a Space followed by any number of Term, Product pairs followed by a single Space may occur in the child list, or not at all.

Node reference

The PT always begins with the Root node:

class parse_ebnf.nodes.Root(startLine=0, startColumn=0, endLine=0, endColumn=0)

The root PT node.

Parent type

parents = None

Children

children = [ zero_or_more(either(Comment, Product, Space)) ]
class parse_ebnf.nodes.Comment(startLine=0, startColumn=0, endLine=0, endColumn=0)

Nodes holding EBNF comments.

Comments in EBNF are enclosed by (*, *) pairs. Nesting is allowed.

Parent type

parents = [ Root, Comment ]

Children

children = [ literal("(*"), zero_or_more(either(Comment, Text)), literal("*)") ]
class parse_ebnf.nodes.Product(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding a product.

A product is a grammar rule, those of the form:

something = another | third, 'a';

This node has the following variables:

  • Those inherited from Node;

  • lhs, the left hand side of the rule, the first Identifier of the children;

  • rhs, the right had side of the rule, the first DefinitionList of the children.

Parent type

parents = [ Root ]

Children

children = [ Identifier, maybe(Space), literal("="), maybe(Space), DefinitionList, maybe(Space), literal([";", "."]) ]
class parse_ebnf.nodes.DefinitionList(startLine=0, startColumn=0, endLine=0, endColumn=0)

Node containing a list of definitions.

Definitions are the lists of concatenations. Definitions are placed in-between alterations symbols. This node can be seen as holding all alternatives of a rule.

Parent type

parents = [ Product, Repeat, Option, Group ]

Children

children = [ zero_or_more(maybe(Space), Definition, maybe(Space), literal(["|", "/", "!"])), maybe(Space), maybe(Definition), maybe(Space) ]
class parse_ebnf.nodes.Definition(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding a definition.

A definition is the list of concatenations on the right hand side of a rule. This node can be seen as having a sequence that a rule needs to match.

Parent type

parents = [ DefinitionList ]

Children

children = [ maybe(Space), zero_or_more(Term, maybe(Space), literal(","), maybe(Space)), maybe(Term, maybe(literal(","))), maybe(Space) ]
class parse_ebnf.nodes.Term(startLine=0, startColumn=0, endLine=0, endColumn=0)

Node holding a single term.

Terms are values that a rule can take, they may be either:

  • Terminals;

  • Non-terminals;

  • Groups.

This node has the following variables:

  • Those inherited from Node;

  • repetition – a Repetition, the one in the children list;

  • primary – what is to be parsed, the one in the children list, always an instance of Primary.

  • exception – an Exception, the one in the children list.

Parent type

parents = [ Definition ]

Children

children = [ maybe(Space), maybe(Repetition), maybe(Space), Primary, maybe(Space), maybe(literal("-"), Exception) ]
class parse_ebnf.nodes.Exception(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding the exception to a term.

Exceptions are written after a term’s primary, prefixed with a -. They’re terms themselves, except that they don’t have exceptions and repetitions.

This node has the following variables:

  • Those inherited from Node;

  • primary – what is to be excluded from parsing, the one in the children list, always an instance of Primary.

Parent type

parents = [ Term ]

Children

children = [ maybe(Space), Primary, maybe(Space) ]
class parse_ebnf.nodes.Repetition(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding how many times at most a term may be repeated.

Parent type

parents = [ Term ]

Children

children = [ Number, maybe(Space), literal("*") ]
class parse_ebnf.nodes.Terminal(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding a terminal.

Terminals are sequences of characters that are enclosed by either:

  • ';

  • ";

  • \`.

Parent type

parents = [ Term, Exception ]

Children

children = [ literal(['"', "'", '`']), Text, literal(['"', "'", '`']) ]
class parse_ebnf.nodes.Repeat(lit=None)

A node holding a repeatable group.

A group enclosed by either:

  • { or (/;

  • } or /).

May be repeated any number of times, including none.

This node has the following variables:

  • Those inherited from Node;

  • lit – the first literal node in the children list.

Parent type

parents = [ Term, Exception ]

Children

children = [ literal(["{", "(/"]), maybe(Space), DefinitionList, literal(["}", "/)"]) ]
class parse_ebnf.nodes.Option(lit=None)

A node holding an optional group.

A group enclosed by either:

  • [ or (:;

  • ] or :).

May occur either once, or not at all.

This node has the following variables:

  • Those inherited from Node;

  • lit – the first literal node in the children list.

Parent type

parents = [ Term, Exception ]

Children

children = [ literal(["[", "(:"]), maybe(Space), DefinitionList, literal(["]", ":)"]) ]
class parse_ebnf.nodes.Group(lit=None)

A node holding a group.

A group enclosed by ( and ) can be used to make what are essentially inline non-terminals.

This node has the following variables:

  • Those inherited from Node;

  • lit – the first literal node in the children list.

Parent type

parents = [ Term, Exception ]

Children

children = [ literal("("), maybe(Space), DefinitionList, literal(")") ]
class parse_ebnf.nodes.Special(startLine=0, startColumn=0, endLine=0, endColumn=0)

A node holding a special sequence.

A sequence of text enclosed by ? is considered a special sequence. The EBNF spec leaves this as room for defining extensions to the language.

Parent type

parents = [ Term, Exception ]

Children

children = [ literal("?"), Text, literal("?") ]
class parse_ebnf.nodes.Identifier(data='')

Node holding an identifier.

Identifiers are alphanumeric string that do not start with a number. They do not contain trailing or leading whitespace, however they may contain whitespace in the middle.

Parent type

parents = [ Product, Definition, Term, Exception ]

Children

children = [ ]
class parse_ebnf.nodes.EmptyString(data='')

A node that represents the empty string.

EBNF does not have a symbol for the empty string, instead it is represented by another empty string.

The data instance variable is always an empty string.

Parent type

parents = [ Term, Exception ]

Children

children = [ ]
class parse_ebnf.nodes.Text(data='')

Node holding text.

Parent type

parents = [ Comment, Terminal, Special ]

Children

children = [ ]
class parse_ebnf.nodes.Space(data='')

Node holding whitespace.

Parent type

parents = [ Root, Product, DefinitionList, Definition, Term, Repetition, Exception, Repeat, Option, Group ]

Children

children = [ ]
class parse_ebnf.nodes.Literal(data='')

Node holding one or more characters.

The actual character sequence depends on the parent. The parent nodes have documentation pertaining to the exact sequence.

Parent type

parents = [ Comment, Product, DefinitionList, Definition, Term, Repetition, Terminal, Repeat, Option, Group, Special ]

Children

children = [ ]
class parse_ebnf.nodes.Number(data='')

Node holding a positive integer.

The number is stored as a string in the data variable. Use the instance function to_int to convert it to an integer.