Segmented Name Strings

Segmented Name Strings

[1] The Segmented Name Strings specification defines several syntaxes representing names and their segment boundaries.

[12] This specification depends on the Infra Standard. The terms code point, ASCII lower alpha, list, and is empty are defined by the Infra Standard.

目次

  1. General definition
  2. East Asian names
  3. Hiragana names
  4. Romaji names
  5. License
  6. Notes

General definition#

[2] A segmented name string is a space-separated list of one or more substrings, representing a name and implied boundaries within it.

[5] Formally, a segmented name string of type T is a segment of type T, followed by zero or more sequences of combinations of a separator of type T and a segment of type T.

[3] A segment of type T is a string of one or more non-space characters that is not a punctuation of type T. There might be T-dependent additional restrictions.

[4] A punctuation of type T is one of the list, referred to as the punctuation list for T.

[6] A separator of type T is either a space, or a space followed by a punctuation of type T followed by a space. A separator that is a space represents an implied boundary, e.g. hyphenation point. A separator that contains a punctuation represents an explicit boundary whose type is denoted by the punctuation.

[7] A space is a U+0020 SPACE character.

[8] A character is a code point.

East Asian names#

[9] A segmented East Asian name string is a type of segmented name string.

[10] A segment of segmented East Asian name string is a string of one or more characters from:

[11] The punctuation list for segmented East Asian name string is empty.

Hiragana names#

[13] A segmented Hiragana name string is a type of segmented name string.

[14] A segment of segmented Hiragana name string is a string of one or more characters from:

Need more formal definition.

[15] The punctuation list of segmented Hiragana name string is empty.

Romaji names#

[16] A segmented Romaji name string is a type of segmented name string.

[17] A segment of segmented Romaji name string is a string of one or more characters from:

Need more formal definition.

[18] The punctuation list of segmented Romaji name string is a list of - and '.

License#

[19] Per CC0 https://creativecommons.org/publicdomain/zero/1.0/, to the extent possible under law, the author of this specification has waived all copyright and related or neighboring rights to this specification.

Notes#