代理組

サロゲートペア (Unicode)

[10] サロゲートペア (surrogate pair) は、 UTF-16 で2つの16ビット符号単位を組み合わせて1つのUnicode符号位置を表すものです。

[11] サロゲートペアで使う16ビット符号単位に相当する符号位置は、 (符号化方式に関わらず) 使わないことになっています。

符号化方式

[12] 次の符号化方式サロゲートペアが使われました。

歴史

[1] I'm not a Klingon : UTF-16, UTF-8 & UTF-32 update to conform with Unicode 5.0's security concerns. ( 版) <http://blogs.msdn.com/shawnste/archive/2007/07/23/utf-16-utf-8-utf-32-update-to-conform-with-unicode-5-0-s-security-concerns.aspx>

[2] Web Applications 1.0 r7084 Make WebSocket silently convert isolated surrogated to U+FFFD rather than throwing an exception. This will result in data corruption when a user types in astral-plane characters that get truncated by naiive script half-way through, rather than crashing the application. ( ( 版)) <http://html5.org/tools/web-apps-tracker?from=7083&to=7084>

[3] Notifications API: minor change ( (Anne van Kesteren 著, 版)) <http://lists.w3.org/Archives/Public/public-web-notification/2012Nov/0010.html>

[4] IRC logs: freenode / #whatwg / 20130915 ( ( 版)) <http://krijnhoetmer.nl/irc-logs/whatwg/20130915#l-241>

[5] IRC logs: freenode / #whatwg / 20101111 ( ( 版)) <http://krijnhoetmer.nl/irc-logs/whatwg/20101111#l-579>

[6] Web Applications 1.0 r6184 Try to clean up the stuff about Unicode characters. ( ( 版)) <http://html5.org/tools/web-apps-tracker?from=6183&to=6184>

[7] IRC logs: freenode / #whatwg / 20140329 ( ( 版)) <http://krijnhoetmer.nl/irc-logs/whatwg/20140329>

[8] IRC logs: freenode / #whatwg / 20140516 ( ( 版)) <http://krijnhoetmer.nl/irc-logs/whatwg/20140516>

[9] [CSSWG] Minutes Seoul F2F 2014-05-19 Part V: Counter Styles, CSS Formatting for Books, Font Load Events, Future F2F Meetings, CSS Syntax - Unpaired Surrogates, MQ Listener ( (Dael Jackson 著, 版)) <http://lists.w3.org/Archives/Public/www-style/2014Jun/0060.html>

[13] Define JavaScript string and scalar value string (annevk著, ) <https://github.com/whatwg/infra/commit/f1be763cfba23d2fc780b35403074c599e69616e>

[14] [c] (2) Disallow surrogates in the input stream; make the syntax sect… (Hixie著, ) <https://github.com/whatwg/html/commit/6dfaa1a826fae1dd50695710498434d201e543f6>

[15] [] (0) Catch unpaired surrogates before trying to convert them to UTF-8. (Hixie著, ) <https://github.com/whatwg/html/commit/53f640d41e2aadfde9cada86d3046d5912ecc818>

[16] [ct] (2) Make surrogates in UTF-8 and character references turn into … (Hixie著, ) <https://github.com/whatwg/html/commit/6db21943d024e774d2aa52573981c130767034e9>

[17] [t] (0) Remove the requirement that the parser deal with raw surrogat… (Hixie著, ) <https://github.com/whatwg/html/commit/3accfd8a1893d91cb3cdbae62b6d8980e456dda6>

[18] [giow] (0) Fix the UTF-8 decoder error handling to handle a few error… (Hixie著, ) <https://github.com/whatwg/html/commit/74e3b6cb761ee8a79b3a1a44d029c128fd0a201f>

[19] [giow] (0) Unpaired surrogates should throw an exception in close, li… (Hixie著, ) <https://github.com/whatwg/html/commit/226e15ebd3d557a67bedcfc043e165d24e4182c1>

[20] [giow] (1) Make WebSocket silently convert isolated surrogated to U+F… (Hixie著, ) <https://github.com/whatwg/html/commit/a817b04f4c262645ef996a5176b4a3f0a3a11928>

[21] [c] (2) Disallow surrogates in the input stream; make the syntax sect… (Hixie著, ) <https://github.com/whatwg/html/commit/6dfaa1a826fae1dd50695710498434d201e543f6>

[22] [cssom] Add IDL `CSSOMString`, typedef of either USVString or DOMString (SimonSapin著, ) <https://github.com/w3c/csswg-drafts/commit/830ae19ffd9a6fa6eb60aa21549d334cb18fb706>

[23] サロゲートペアの (UTF-16 の) 支持者は、 不支持者に対して、「サロゲートペアの処理は結合文字の処理より簡単だ、 どのみち結合文字の処理は必要だから大した問題ではない」 と返すのが (十数年にわたってw) 定番となっています。

[24] 文字符号化のレイヤーと文字の処理のレイヤーが混ざっているのは20世紀のソフトウェア開発技法だと思うんですがねぇ。 (シフトJIS半角全角バイト数と一致するから表示処理が楽だ、 みたいなのと同じでしょう。)

[25] [css-syntax] Remove 'code point' and 'surrogate code point' in favor … (tabatkins著, ) <https://github.com/w3c/csswg-drafts/commit/320a990184a331057a56a17cdf627fee81bdc5d3>