segmented East Asian name string

Segmented Name Strings

[1] The Segmented Name Strings specification defines several syntaxes representing names and their segment boundaries.

[12] This specification depends on the Infra Standard. The terms code point, ASCII lower alpha, list, and is empty are defined by the Infra Standard.

General definition

[2] A segmented name string is a space-separated list of one or more substrings, representing a name and implied boundaries within it.

[5] Formally, a segmented name string of type T is a segment of type T, followed by zero or more sequences of combinations of a separator of type T and a segment of type T.

[3] A segment of type T is a string of one or more non-space characters that is not a punctuation of type T. There might be T-dependent additional restrictions.

[4] A punctuation of type T is one of the list, referred to as the punctuation list for T.

[6] A separator of type T is either a space, or a space followed by a punctuation of type T followed by a space. A separator that is a space represents an implied boundary, e.g. hyphenation point. A separator that contains a punctuation represents an explicit boundary whose type is denoted by the punctuation.

[7] A space is a U+0020 SPACE character.

[8] A character is a code point.

East Asian names

[9] A segmented East Asian name string is a type of segmented name string.

[10] A segment of segmented East Asian name string is a string of one or more characters from:

[11] The punctuation list for segmented East Asian name string is empty.

Hiragana names

[13] A segmented Hiragana name string is a type of segmented name string.

[14] A segment of segmented Hiragana name string is a string of one or more characters from:

Need more formal definition.

[15] The punctuation list of segmented Hiragana name string is empty.

Romaji names

[16] A segmented Romaji name string is a type of segmented name string.

[17] A segment of segmented Romaji name string is a string of one or more characters from:

Need more formal definition.

[18] The punctuation list of segmented Romaji name string is a list of - and '.

License

[19] Per CC0 https://creativecommons.org/publicdomain/zero/1.0/, to the extent possible under law, the author of this specification has waived all copyright and related or neighboring rights to this specification.

Notes