UTF-8符号化されたバイト列と文字列

UTF-8 符号化されたバイト列

[9] Perl はバイト列を文字列として扱うことができます。これは同型復号に相当する暗黙の型変換を伴っています。 utf8::upgrade

[1] JSON でバイト列を扱う方法として、相当する Latin1 の文字とみなす方法があります。これは同型復号に相当する操作です。

UTF-8 符号化された Latin1

[4] GitHub - grantm/encoding-fixlatin: CPAN module: Fixes Latin-1 and CP1252 characters in UTF8 data, 2025-06-25T08:12:05.000Z https://github.com/grantm/encoding-fixlatin

UTF-8 符号化された TIS 620

[2] thaiconv | Lyndon Hill, Lyndon Hill, 2025-04-16T20:26:33.000Z, 2025-08-02T09:22:11.073Z https://www.lyndonhill.com/projects/thaiconv.html

Cross coded UTF-8
TIS-620 that has been converted to UTF-8 Latin1 (0xA0-0xF0). For example, the Thai character that has the value 160 in TIS-620 may have the Latin representation é, this character gets converted to the Unicode for é. This mode is likely to be converted correctly only if the cross coding and decoding occur in the same locality.

UTF-8 符号化された EUC-JP

[23] ¥«¥¿¥í¥°Æâ¸¡º÷¡Ã¶õÄ´À½ÉÊ¥«¥¿¥í¥°¡Ã¥À¥¤¥¥ó¹©¶È³ô¼°²ñ¼Ò¡Ã, 2025-11-23T08:22:31.000Z https://web.archive.org/web/20251123062132id_/https://ec.daikinaircon.com/cgi-bin/ecatalog/fulltextsearch.cgi?order=new&kwd=%3F%3F%3F%3Fp&old=&pg=70

[24] >>23 EUC-JP を UTF-8 符号化したもの

`UTF8CP1252`

[6] >>5

`UTF8UTF8`

[5] compact_enc_det/compact_enc_det/compact_enc_det.cc at master · google/compact_enc_det · GitHub, 2025-11-24T04:01:07.000Z https://github.com/google/compact_enc_det/blob/master/compact_enc_det/compact_enc_det.cc#L50

[205] compact_enc_det/util/encodings/encodings.pb.h at master · google/compact_enc_det · GitHub, 2025-05-20T15:13:42.000Z https://github.com/google/compact_enc_det/blob/master/util/encodings/encodings.pb.h#L150

  // Some external vendors make the common input error of
  // converting MSFT_CP1252 to UTF8 *twice*. No output conversion needed.
  UTF8UTF8             = 63,

[212] Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one - metacpan.org, 2025-06-25T08:08:29.000Z https://metacpan.org/pod/Encode::DoubleEncodedUTF8

[214] Transliteration Tools for Indian Languages | ashishware.com, 2025-02-23T07:19:42.000Z, 2025-07-13T08:53:33.598Z https://ashishware.com/2006/06/25/Transl.shtml/

[215] >>214 本文前半は本当の UTF-8。後半は UTF-8 を再度 UTF-8 符号化したものか。

UTF8UTF8