manakai index data structures

manakai index data structures

[19] This page describes how source positions of nodes, attributes, tokens, and errors are stored and handled in manakai's DOM/HTML related Perl modules.

DocumentIndex

[2] A DocumentIndex is a positive integer used to identify a source data stream.

[3] The value -1 represents that the source is an unknown data stream.

CharacterIndex

[4] A CharacterIndex is a non-negative integer representing the location of a character in the given source character stream. It is the offset from the beginning of the character string.

[5] It is unclear whether "an unknown character index" is necessary or not at the time of writing. If necessary, the value -1 will be used.

LineNumber

[43] A LineNumber is a positive integer representing the line in the given source character stream in which the character appears.

[47] The LineNumber of the first line in the source character stream is 1.

[44] It is unclear whether "an unknown line number" is necessary or not. If necessary, the value -1 will be used.

ColumnNumber

[45] A ColumnNumber is a non-negative integer representing the column in the line in the given source character stream at which the character appears.

[46] The ColumnNumber of the first character in the line is 1.

[48] The ColumnNumber of the newline can be 0 (a ColumnNumber in the next line) or equal to the number of the characters in the previous line (a ColumnNumber in the previous line).

[49] It is unclear whether "an unknown line number" is necessary or not. If necessary, the value -1 will be used. Note that it might be possible to use -1 for the CR of CRLF sequence...

IndexedStringSegment

[6] An IndexedStringSegment is an array reference containing a character string, a DocumentIndex, and a CharacterIndex.

[7] The pair of the DocumentIndex and the CharacterIndex identifies the source location of the first character of the character string.

[8] An IndexedStringSegment represents the character string.

IndexedString

[1] An IndexedString is an array reference containing zero or more IndexedStringSegment, representing a string consist of concatenation of strings represented by the IndexedStringSegments, in order.

IndexIndexMappingSegment

[22] An IndexIndexMappingSegment is an array reference whose items are a CharacterIndex, a DocumentIndex, and a CharacterIndex.

[23] It represents that the character at the first CharacterIndex is corresponding to the character at the last CharacterIndex in the document identified by the DocumentIndex.

IndexIndexMapping

[24] An IndexIndexMapping is an array reference whose items are zero or more IndexIndexMappingSegments.

[25] The first CharacterIndex in an earlier IndexIndexMappingSegment in the IndexIndexMapping must be less than or equal to the first CharacterIndex of a later IndexIndexMappingSegment.

[35] It defines a mapping from the CharacterIndex of a character to a pair of DocumentIndex and CharacterIndex.

[29] The relevant segment of a mapping for a CharacterIndex is the result of these steps:

  1. [32] Let result be a segment (0, -1, 0).
  2. [30] For each segments in the mapping, in order:
    1. [31] If the first CharacterIndex of the segment is greater than the given CharacterIndex, return result and abort these steps.
    2. [33] Otherwise, set the item to result.
  3. [34] Return result.

[26] The mapped (DocumentIndex, CharacterIndex) pair of a character obtained by an IndexIndexMapping is the pair of DocumentIndex and the last CharacterIndex of the relevant segment of the IndexIndexMapping for the CharacterIndex identifying the character.

[28] Note that if there are more than one IndexIndexMappingSegments whose first CharacterIndexes are equal, the earlier IndexIndexMappingSegment is ignored.

IndexLCMappingSegment

[36] An IndexLCMappingSegment is an array reference whose items are a CharacterIndex, a LineNumber, and a ColumnNumber.

[37] It represents that the character at the CharacterIndex has LineNumber and ColumnNumber.

IndexLCMapping

[38] An IndexLCMapping is an array reference whose items are zero or more IndexLCMappingSegments.

[39] The CharacterIndex in an earlier IndexLCMappingSegment in the IndexLCMapping must be less than or equal to the CharacterIndex of a later IndexLCMappingSegment.

[40] It defines a mapping from the CharacterIndex of a character to a pair of LineNumber and ColumnNumber.

[41] The mapped (LineNumber, ColumnNumber) pair of a character obtained by an IndexLCMapping is the pair of LineNumber and the ColumnNumber of the relevant segment of the IndexLCMapping for the CharacterIndex identifying the character.

[42] Note that if there are more than one IndexLCMappingSegments whose first CharacterIndexes are equal, the earlier IndexLCMappingSegment is ignored.

DocumentIndexData

[50] A DocumentIndexData is a hash reference. It contains properties of a document.

[51] If there is key map, its value must be an IndexIndexMapping. Its IndexIndexMappingSegments' first CharacterIndexes are considered as indexes in this document.

[52] If there is key lc_map, its value must be an IndexLCMapping. Its IndexLCMappingSegments' first CharacterIndexes are considered as indexes in this document.

[27] If there is key url, its value must be a character string. It is the URL of the document, if known. It must be an absolute URL. However, the consumer of the value must not assume that it is an absolute URL.

[55] Applications can use other keys for their own purposes. They should perform their best effort to avoid conflicting of keys with future addition and of another applications.

DocumentIndexDataSet

[53] A DocumentIndexDataSet is an array reference whose items are DocumentIndexData or undef.

[54] If the i-th item in the array reference is a DocumentIndexData, it is the DocumentIndexData for the document whose DocumentIndex is i. If the i-th item in the array reference is undef, or there is no i-th item in the array reference, DocumentIndex i is not assigned to any document.

Web IDL Perl binding extensions

[9] If the type of an argument is IndexedStringSegment, run these steps:

  1. Let value be the specified value.
  2. If value is not an array reference, throw an TypeError and abort these steps.
  3. Let new value be a new array reference.
  4. Push ToString(value->[0]) to new value.
  5. Push ToNumber(value->[1]) to new value.
  6. Push ToNumber(value->[2]) to new value.
  7. Return new value.

[10] If the type of an argument is IndexedString, run these steps:

  1. Let value be the specified value.
  2. If value is not an array reference, throw an TypeError and abort these steps.
  3. If there is an item in value which is not an array reference, throw an TypeError and abort these steps.
  4. Let new value be a new array reference.
  5. For each item in value, in order,
    1. Apply >>9 to the item and push the result into new value.
  6. Return new value.

DOM Node extensions

[11] Each Node and Attr has associated source document index and source character index. Initially, they are unset.

partial interface Node {
  IndexedStringSegment manakaiGetSourceLocation;
  void manakaiSetSourceLocation (IndexedStringSegment? new);
};
partial interface Attr {
  IndexedStringSegment manakaiGetSourceLocation;
  void manakaiSetSourceLocation (IndexedStringSegment? new);
};

[12] The manakaiGetSourceLocation method MUST run these steps:

  1. If the source document index of the context object is set,
    1. Return a new IndexedStringSegment whose character string is the empty string, DocumentIndex is the source document index of the context object, and CharacterIndex is the source character index of the context object.
  2. Otherwise, if the context object is a CharacterData, Attr, or AttributeDefinition and its data or value consist of one or more IndexedStringSegment,
    1. Let segment be the first IndexedStringSegment of the data or value of the context object.
    2. Return a new IndexedStringSegment whose character string is the empty string, DocumentIndex is the DocumentIndex of segment, and CharacterIndex is the CharacterIndex of segment.
  3. Otherwise, return a new IndexedStringSegment whose character string is the empty string, DocumentIndex is -1, and CharacterIndex is 0.

[13] The manakaiSetSourceLocation method MUST run these steps:

  1. If the argument is null, unset the source document index and the source character index of the context object.
  2. Otherwise, set the source document index of the context object to the DocumentIndex of the argument and the source character index of the context object to the CharacterIndex of the argument.

[14] The data of a CharacterData node, the value of an AttributeDefinition node, and the value of an Attr object are internally stored as IndexedString (or equivalent).

[15] When they are to be modified by a method that does not aware of IndexedString, it MUST be performed in a way that does not make DocumentIndex and CharactrIndex contained in the IndexedString incorrect. When it is not desired to preserve DocumentIndex and CharacterIndex of affected IndexedStringSegments or when a DOMString is inserted, their DocumentIndex and CharacterIndex MUST be set to -1 and 0, respectively.

partial interface Node {
  IndexedString? manakaiGetIndexedString ();
  void manakaiAppendIndexedString (IndexedString s);
};
partial interface Attr {
  IndexedString? manakaiGetIndexedString ();
  void manakaiAppendIndexedString (IndexedString s);
};

[16] The manakaiGetIndexedString method MUST return a new IndexedString which represents the same string as the textContent attribute of the context object. DocumentIndex and CharacterIndex contained in it MUST be set to appropriate values.

[17] The manakaiAppendIndexedString method MUST act as if the manakaiAppendText method is invoked with the character string of the IndexedString argument. In addition, DocumentIndex and CharacterIndex contained in the IndexedString argument MUST be used to update the data or value of relevant objects.

partial interface Element {
  IndexedString? manakaiGetAttributeIndexedStringNS (DOMString? namespace, DOMString localName);
  void manakaiSetAttributeIndexedStringNS (DOMString? namespace, DOMString name, IndexedString value);
};

[20] The manakaiGetAttributeIndexedStringNS method MUST act as if the getAttributeNS method is invoked, except the method MUST return a new IndexedString instead of a DOMString.

[21] The manakaiSetAttributeIndexedStringNS method MUST act as if the setAttributeNS method is invoked with same namespace and name arguments, and value argument which is a DOMString equivalent to the character string of the IndexedString argument. In addition, DocumentIndex and CharacterIndex contained in the IndexedString argument MUST be used to set the value of the attribute.

[18] Exactly how DocumentIndex and CharacterIndex is handled by various methods defined here or by DOM Standard is a quality of implementation issue.