titlecased canonicalized UTF-8

仕様書

[1] RFC 5051 - i;unicode-casemap - Simple Unicode Collation Algorithm (2014-02-02 02:15:24 +09:00 版) http://tools.ietf.org/html/rfc5051

定義

[2] i;unicode-casemap は、titlecase 化正準化 UTF-8 について i;octet と同じように等価性、順序、部分文字列の各演算を定義するものです >>1。

[3] titlecase 化正準化 UTF-8 (titlecased canonicalized UTF-8) とは、入力が既知の charset で UTF-8 に変換できるなら UTF-8 に変換した上で次の操作を行い、そうでないなら元のままとしたものです >>1。 UTF-8 に変換できた文字列については、 UnicodeData.txt に基づき次のように処理します。

[4] titlecase が定義されていれば、それに置き換える
[5] 分解が定義されていれば、それに置き換える
[6] これらを可能な限り繰り返す

Unicode の版

[8] 特定の Unicodeの版を参照しているのではなく、任意あるいは最新の版を参照しているようです。

一覧

[10] 対応関係の一覧は >>9 にあります。

[9] Character mapping "rfc5051:titlecase-canonical" (2014-04-01 12:05:23 +09:00 版) http://chars.suikawiki.org/map/rfc5051%3Atitlecase-canonical

応用

[12] Lemonade

メモ

[7] 意図的かどうかは不明ですが、 Unicode の case folding や NFKC の定義を参照せずに、 UnicodeData.txt の値を直接参照しています。そのため、 SpecialCasing.txt で定義されるような複雑な大文字と小文字の関係は無視されますし、ハングル音節の分解は行われません。

[11] RFC 5255 - Internet Message Access Protocol Internationalization (2016-01-10 21:18:21 +09:00 版) https://tools.ietf.org/html/rfc5255

A server that advertises this extension MUST implement the
i;unicode-casemap comparator, as defined in [UCM]. It MAY implement
other comparators from the IANA registry established by [RFC4790].
See also Section 4.5 of this document.
A server that advertises this extension SHOULD use i;unicode-casemap
as the default comparator. (Note that i;unicode-casemap is the
default comparator for I18NLEVEL=1, but not necessarily the default
for I18NLEVEL=2.) The selection of the default comparator MAY be
adjustable by the server administrator, and MAY be sensitive to the
current user. Once the IMAP connection enters authenticated state,
the default comparator MUST remain static for the remainder of that
connection.