1付き

[2] HZ-GB-23121, X-ISO-10646-UCS-4-21431, X-ISO-10646-UCS-4-34121 は、 HZ-GB-2312, X-ISO-10646-UCS-4-2143, X-ISO-10646-UCS-4-3412 が低品質なソフトウェアのドキュメントにおいて雑にコピペされて生じた架空の文字コードの呼称です。

[4] 発生源となった (がそれ自体には罪はない) のは、 juniversalchardet というソフトウェアのドキュメントです。

[5] 元々次のように書かれていました。 >>3

Chinese
HZ-GB-2312¹
Unicode
UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-3412¹ / X-ISO-10646-UCS-4-2143¹
1 Currently not supported by Java

[6] すなわちこれら3つの文字符号化の名前の直後に脚注を表す「¹」がありました。 HTML の sup 要素で記述されていました。

[3] Google Code Archive - Long-term storage for Google Code Project Hosting., 2025-05-18T03:54:50.000Z https://code.google.com/archive/p/juniversalchardet/

[7] juniversalchardet は直接または間接に多くのソフトウェアの派生源となりました。 juniversalchardet 派生時にドキュメントも転用されました。 多くの派生物は文字コードの名称を適切に書いていますが、一部は誤った形で処理してしまいました。

[8]

Release notes for Gerrit 2.1.2, 2015-12-22T02:18:05.000Z, 2025-05-18T04:01:15.750Z

https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.1.2.html

Improved character set detection Gerrit now uses the Mozilla character set detection algorithm when trying to determine what charset was used to write a text file. For UTF-8 or ISO-8859-1/ASCII users, there should be no difference over prior releases. With this change, the server can now also automatically recognize source files encoded in:
a. Chinese (ISO-2022-CN, BIG5, EUC-TW, GB18030, HZ-GB-23121)
g.Unicode (UTF-8, UTF-16BE / UTF-16LE, UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431)

uchardet / uchardet · GitLab, 2025-05-18T04:25:04.000Z

https://gitlab.freedesktop.org/uchardet/uchardet

UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431

HZ-GB-2312

[9]

GitHub - PyYoshi/uchardet: uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. Returned encoding names are iconv-compatible., 2025-05-18T04:03:37.000Z

https://github.com/PyYoshi/uchardet

UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431

HZ-GB-2312

[10]

GitHub - PyYoshi/cChardet: universal character encoding detector, 2025-05-18T04:04:23.000Z

https://github.com/PyYoshi/cChardet

UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431

HZ-GB-2312

[11]

GitHub - CharsetDetector/UTF-unknown: Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+, 2025-05-18T04:05:30.000Z

https://github.com/CharsetDetector/UTF-unknown

Encodings with BOM: utf-7, utf-8, utf-16be/utf-16le, utf-32be/utf-32le, X-ISO-10646-UCS-4-34121/X-ISO-10646-UCS-4-21431, gb18030.

Chinese iso-2022-cn, big5, euc-tw, gb18030, hz-gb-2312

Remarks: For some aliases of encoding not available: cp949, iso-2022-cn, euc-tw, iso-8859-10, iso-8859-16, viscii, X-ISO-10646-UCS-4-34121/X-ISO-10646-UCS-4-21431.

[12] Microsoft PowerPoint - MLTP_2010_9_6 - MLTP_Manual.pdf, 2018-10-30T03:13:27.000Z, 2025-05-18T04:13:39.428Z https://www.tufs.ac.jp/ts/personal/corpuskun/pdf/2018/MLTP_Manual.pdf
[13] Microsoft PowerPoint - MLTP_2010_9_6 - MLTP.pdf, 2010-11-05T12:43:40.000Z, 2025-05-18T04:14:14.427Z http://textdata.web.fc2.com/MLTP.pdf

HZ‐GB‐23121

UTF‐8, UTF‐16BE/16LE, UTF‐32BE/32LE/ X‐ISO‐10646‐UCS‐4‐34121/4‐21431

[14] これら現在までに確認されている用例は、いずれもドキュメントにのみ出現するのであって、実際のコードでは正しい名称が使われています。

[15] つまり開発者が内容をよく理解することなく雑にコピペしたために、コードとドキュメントが乖離しているのです。

1付き

幽霊文字コード

関連

メモ