[1] Webページへの navigate における文字コードの判定には色々な手法が組合せられますが、 いずれとも決めかねるとき最終的に利用者のロケールによって決まる既定値たる文字コードが選ばれます。
[2] 利用者のロケールが利用者がよく見るWebページの自然言語や符号化と相関があると考えられるため、 ロケールが使われると説明されています。 >>177
[4]
歴史的にみれば Web はそれぞれの利用者の国や言語に合わせて地域化される形で普及したものであり、
各地で標準的に使われていた文字コードがその地域のデファクト標準として使われ始めました。
Webブラウザーは動作プラットフォームの既定値、すなわち利用者のロケールの文字コードをそのまま標準的に使っていました。
[5]
ロケール設定に依存するということは、ロケール設定が違えば文字化けするということです。
もちろんこれは望ましくないことで、著者は HTTP charset や
<meta charset> で明示的に文字コードの指定を行うべきで、
Webブラウザーは頻度解析等の手法により可能な限り高精度な文字コードの判定を行うべきです。
ロケール依存の既定値は、それらでどうしても決定できない場合にだけ使われます。
[6] 現在となってはロケール依存の既定値が使わざるを得ない状況はそう多くはないとはいえ、 古い時代に作られたWebサイトを中心に、未だに必要とされています。 今後急激に増える可能性は皆無ですが、 既存の Webサイト (Internet Archive 所蔵のものを含む。) を後世にそのままの形で伝えていくために Webブラウザーは恒久的にロケール依存の既定値に対応し続けなければなりません。
[3] HTML Standard には、策定当時の主要 Webブラウザーの実装状況に基づき決められたロケールと既定の文字符号化の対応表が示されています。 >>177
[7] この既定値は実装依存かつ利用者の指定が可能と規定されており、 HTML Standard に提示された表が1つの基準となるものの、 それが強制されているわけではありません。
[8] 実装は、実際の Webサイトの実情や他の実装の状況を勘案しつつ、 Web互換性を高めるため改善を続けることが期待されます。
[17] HTML Standard の表は直接的には既定値しか定めていませんが、 頻度解析等の手法で重視するべき文字コード群の絞り込みにもロケール情報は有用と考えられます。
[10] HTML5 (現 HTML Standard) の主要な版の変更点と、 それに関連して当時参照された実装の状況は次の通りです。
azaz-Cyrl-AZba-RUbebe-BYbs-Cyrl-BAbs-Latn-BAel-GRhe (he-IL)hu (hu-HU)hu-HUja (ja, ja-JP-mac)kageorgian-ps >>15kkko (ko-KR)kukylv (lv-LV)mkmnpl (pl-PL)pl-PLprs-AFroro-ROsah-RUsl-SIsqsr-Latn-BAsr-Latn-SPtg-Cyrl-TJtk-TMtr (tr-TR)ttug-CNuruzuz-Cyrl-UZ[205] ロケール依存の既定値について、 HTML5 の当初の規定は「主に西洋 (Western) では Windows-1252」というだけしかありませんでした。 >>204
[206] その後 Mozilla 1.9.1 に基づく表に改められました。 >>204
<tr>
<td>ar
<td>UTF-8
<tr>
<td>be
<td>ISO-8859-5
<tr>
<td>bg
<td>windows-1251
<tr>
<td>cs<!-- -CZ -->
<td>ISO-8859-2
<tr>
<td>cy
<td>UTF-8
<tr>
<td>fa<!-- -IR -->
<td>UTF-8
<tr>
<td>he<!-- -IL -->
<td>windows-1255
<tr>
<td>hr
<td>UTF-8
<tr>
<td>hu<!-- -HU -->
<td>ISO-8859-2
<tr>
<td>ja <!-- and ja-JP-mac -->
<td>windows-31J <!-- Shift_JIS -->
<tr>
<td>kk
<td>UTF-8
<tr>
<td>ko<!-- -KR -->
<td>windows-949 <!-- EUC-KR -->
<tr>
<td>ku
<td>windows-1254 <!-- ISO-8859-9 -->
<tr>
<td>lt
<td>windows-1257
<tr>
<td>lv<!-- -LV -->
<td>ISO-8859-13
<tr>
<td>mk<!-- -MK -->
<td>UTF-8
<tr>
<td>or
<td>UTF-8
<tr>
<td>pl<!-- -PL -->
<td>ISO-8859-2
<tr>
<td>ro
<td>UTF-8
<tr>
<td>ru
<td>windows-1251
<tr>
<td>sk
<td>windows-1250
<tr>
<td>sl
<td>ISO-8859-2
<tr>
<td>sr
<td>UTF-8
<tr>
<td>th
<td>windows-874 <!-- TIS-620 -->
<tr>
<td>tr<!-- -TR -->
<td>windows-1254 <!-- ISO-8859-9 -->
<tr>
<td>uk
<td>windows-1251
<tr>
<td>vi
<td>UTF-8
<tr>
<td>zh-CN
<td>GB18030
<tr>
<td>zh-TW
<td>Big5
<tr>
<td>All other locales
<td>windows-1252[192] 更に当時の Webブラウザーの挙動の調査が進められました。 >>191
Locale Description Vista Chrome Spec/Firefox ro Romanian windows-1250 ISO-8859-2 windows-1252 cs Czech windows-1250 windows-1250 ISO-8859-2 hu Hungarian windows-1250 ISO-8859-2 ISO-8859-2 lv Latvian windows-1257 windows-1257 ISO-8859-13 sl Slovenian windows-1250 ISO-8859-2 ISO-8859-2 pl Polish windows-1250 ISO-8859-2 ISO-8859-2 be Belarusian windows-1251 <none> ISO-8859-5 el Greek windows-1253 ISO-8859-7 windows-1252
[196] その結果を踏まえた変更の経緯が HTML5 に注釈として記録されています。 >>195
<!-- af, Afrikaans, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- am, Amharic, uses windows-1252: Firefox and Chrome agreed -->
<tr>
<td>ar
<td>Arabic
<td>windows-1256 <!-- Windows Vista and Chrome agreed -->
<!-- arn-CL, Mapudungun (Chile), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- az, Azeri, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1254 -->
<!-- az-Cyrl-AZ, Azeri (Cyrillic, Azerbaijan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 -->
<!-- be-BY, Belarusian (Belarus), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<tr>
<td>bg
<td>Bulgarian
<td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- bn, Bengali, uses windows-1252: Firefox and Chrome agreed -->
<!-- br-FR, Breton (France), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- bs-Cyrl-BA, Bosnian (Cyrillic, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- bs-Latn-BA, Bosnian (Latin, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- ca, Catalan, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- co-FR, Corsican (France), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>cs
<td>Czech
<td>windows-1250 <!-- Windows Vista and Chrome agreed (but disagreed with Firefox, which thought the encoding should be ISO-8859-2) -->
<!-- cy-GB, Welsh (United Kingdom), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- da, Danish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- de, German, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- el, Greek, is not listed here because Windows Vista wanted windows-1253, Chrome wanted ISO-8859-7, and Firefox wanted windows-1252 -->
<!-- el-GR, Greek (Greece), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1253 -->
<!-- en, English, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- es, Spanish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<tr>
<td>et
<td>Estonian
<td>windows-1257 <!-- Windows Vista and Chrome agreed -->
<!-- eu, Basque, uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>fa
<td>Persian
<td>windows-1256 <!-- Windows Vista and Chrome agreed -->
<!-- fi, Finnish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- fil, Filipino, uses windows-1252: Firefox and Chrome agreed -->
<!-- fo, Faroese, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- fr, French, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- fy-NL, Frisian (Netherlands), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- ga-IE, Irish (Ireland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- gl, Galician, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- gsw-FR, Alsatian (France), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- gu, Gujarati, uses windows-1252: Firefox and Chrome agreed -->
<!-- ha-Latn-NG, Hausa (Latin, Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>he
<td>Hebrew
<td>windows-1255 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- hi, Hindi, uses windows-1252: Firefox and Chrome agreed -->
<tr>
<td>hr
<td>Croatian
<td>windows-1250 <!-- Windows Vista and Chrome agreed -->
<tr>
<td>hu
<td>Hungarian
<td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->
<!-- hu-HU, Hungarian (Hungary), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- id, Indonesian, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- ig-NG, Igbo (Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- is, Icelandic, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- it, Italian, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- iu-Latn-CA, Inuktitut (Latin, Canada), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>ja
<td>Japanese
<td>Shift_JIS <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- kk, Kazakh, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- kl-GL, Greenlandic (Greenland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- kn, Kannada, uses windows-1252: Firefox and Chrome agreed -->
<tr>
<td>ko
<td>Korean
<td>windows-949 <!-- Windows Vista, Chrome, and Firefox agreed -->
<tr>
<td>ku
<td>Kurdish
<td>windows-1254 <!-- Best guess -->
<!-- ky, Kyrgyz, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- lb-LU, Luxembourgish (Luxembourg), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>lt
<td>Lithuanian
<td>windows-1257 <!-- Windows Vista, Chrome, and Firefox agreed -->
<tr>
<td>lv
<td>Latvian
<td>windows-1257 <!-- Windows Vista and Chrome agreed (but disagreed with Firefox, which thought the encoding should be ISO-8859-13) -->
<!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- ml, Malayalam, uses windows-1252: Firefox and Chrome agreed -->
<!-- mn, Mongolian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- moh-CA, Mohawk (Mohawk), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- mr, Marathi, uses windows-1252: Firefox and Chrome agreed -->
<!-- ms, Malay, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- nb, Norwegian Bokmål, uses windows-1252: Firefox and Chrome agreed -->
<!-- nl, Dutch, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- nn-NO, Norwegian, Nynorsk (Norway), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- no, Norwegian, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- nso-ZA, Sesotho sa Leboa (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- oc-FR, Occitan (France), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>pl
<td>Polish
<td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->
<!-- pl-PL, Polish (Poland), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- prs-AF, Dari (Afghanistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->
<!-- pt, Portuguese, uses windows-1252: Windows Vista and Firefox agreed -->
<!-- qut-GT, K'iche (Guatemala), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- quz-BO, Quechua (Bolivia), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- quz-EC, Quechua (Ecuador), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- quz-PE, Quechua (Peru), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- rm-CH, Romansh (Switzerland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- ro, Romanian, is not listed here because Windows Vista wanted windows-1250, Chrome wanted ISO-8859-2, and Firefox wanted <none> -->
<!-- ro-RO, Romanian (Romania), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<tr>
<td>ru
<td>Russian
<td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- rw-RW, Kinyarwanda (Rwanda), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- se-FI, Sami, Northern (Finland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- se-NO, Sami, Northern (Norway), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- se-SE, Sami, Northern (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>sk
<td>Slovak
<td>windows-1250 <!-- Windows Vista, Chrome, and Firefox agreed -->
<tr>
<td>sl
<td>Slovenian
<td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->
<!-- sl-SI, Slovenian (Slovenia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- sma-NO, Sami, Southern (Norway), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- sma-SE, Sami, Southern (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- smj-NO, Sami, Lule (Norway), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- smj-SE, Sami, Lule (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- smn-FI, Sami, Inari (Finland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- sms-FI, Sami, Skolt (Finland), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- sq, Albanian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<tr>
<td>sr
<td>Serbian
<td>windows-1251 <!-- Windows Vista and Chrome agreed -->
<!-- sr-Latn-BA, Serbian (Latin, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- sr-Latn-SP, Serbian (Latin, Serbia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- sv, Swedish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- sw, Kiswahili, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->
<!-- ta, Tamil, uses windows-1252: Firefox and Chrome agreed -->
<!-- te, Telugu, uses windows-1252: Firefox and Chrome agreed -->
<!-- tg-Cyrl-TJ, Tajik (Cyrillic, Tajikistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<tr>
<td>th
<td>Thai
<td>windows-874 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- tk-TM, Turkmen (Turkmenistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->
<!-- tn-ZA, Setswana (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>tr
<td>Turkish
<td>windows-1254 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- tzm-Latn-DZ, Tamazight (Latin, Algeria), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- ug-CN, Uighur (PRC), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->
<tr>
<td>uk
<td>Ukrainian
<td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- ur, Urdu, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->
<!-- uz, Uzbek, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1254 -->
<!-- uz-Cyrl-UZ, Uzbek (Cyrillic, Uzbekistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<tr>
<td>vi
<td>Vietnamese
<td>windows-1258 <!-- Windows Vista and Chrome agreed -->
<!-- wee-DE, Lower Sorbian (Germany), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- wen-DE, Upper Sorbian (Germany), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- wo-SN, Wolof (Senegal), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- xh-ZA, isiXhosa (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->
<!-- yo-NG, Yoruba (Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td>zh-CN
<td>Chinese (People's Republic of China)
<td>GB18030 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- zh-HK, Chinese (Hong Kong S.A.R.), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->
<!-- zh-Hans, Chinese (Simplified), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted GB18030 -->
<!-- zh-Hant, Chinese (Traditional), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->
<!-- zh-MO, Chinese (Macao S.A.R.), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->
<!-- zh-SG, Chinese (Singapore), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted GB18030 -->
<tr>
<td>zh-TW
<td>Chinese (Taiwan)
<td>Big5 <!-- Windows Vista, Chrome, and Firefox agreed -->
<!-- zu-ZA, isiZulu (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->
<tr>
<td colspan=2>All other locales
<td>windows-1252 <!-- ba wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
<tr>
<td>ba
<td>Bashkir
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- be, Belarusian, was not initially listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 -->
<!-- further feedback has changed this: -->
<tr>
<td>be
<td>Belarusian
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- kk, Kazakh, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- further feedback has changed this: -->
<tr>
<td>kk
<td>Kazakh
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- ky, Kyrgyz, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- further feedback has changed this: -->
<tr>
<td>ky
<td>Kyrgyz
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- mk, Macedonian, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- further feedback has changed this: -->
<tr>
<td>mk
<td>Macedonian
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- sah wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
<tr>
<td>sah
<td>Yakut
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- tg wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
<tr>
<td>tg
<td>Tajik
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- tt, Tatar, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
<!-- further feedback has changed this: -->
<tr>
<td>tt
<td>Tatar
<td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 --> <!-- el, Greek, was not initially listed here because Windows Vista wanted windows-1253, Chrome wanted ISO-8859-7, and Firefox wanted ISO-8859-7 but looked liked it wanted windows-1252 -->
<!-- further feedback has changed this: -->
<tr>
<td>el
<td>Greek
<td>ISO-8859-7 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23090 -->[194] 既定値が存在するにも関わらず一定していないのは、当該市場でいずれかの Webブラウザーがそれほど影響力を持っていなかったのか、 文字化けに遭遇するたびに文字コード指定メニューで手動で切り替えていたのか、 既定値まで判断がもつれ込むことが多くなかったのか、 など考えられ、検討が必要です。 挙げられた国の多くが中欧、東欧の文字コードの混乱があった地域で、 自動判定が失敗しがちな似たような文字コードであることには注意したいです。
[9] 現在の HTML Standard の規定 >>177:
Locale language Suggested default encoding ar Arabic windows-1256 az Azeri windows-1254 ba Bashkir windows-1251 be Belarusian windows-1251 bg Bulgarian windows-1251 cs Czech windows-1250 el Greek ISO-8859-7 et Estonian windows-1257 fa Persian windows-1256 he Hebrew windows-1255 hr Croatian windows-1250 hu Hungarian ISO-8859-2 ja Japanese Shift_JIS kk Kazakh windows-1251 ko Korean EUC-KR ku Kurdish windows-1254 ky Kyrgyz windows-1251 lt Lithuanian windows-1257 lv Latvian windows-1257 mk Macedonian windows-1251 pl Polish ISO-8859-2 ru Russian windows-1251 sah Yakut windows-1251 sk Slovak windows-1250 sl Slovenian ISO-8859-2 sr Serbian windows-1251 tg Tajik windows-1251 th Thai windows-874 tr Turkish windows-1254 tt Tatar windows-1251 uk Ukrainian windows-1251 vi Vietnamese windows-1258 zh-Hans, zh-CN, zh-SG Chinese, Simplified GBK zh-Hant, zh-HK, zh-MO, zh-TW Chinese, Traditional Big5 All other locales windows-1252
[11] 改正の過程と実装との差異は複雑に見えますが、 基本的には実態に合わせるための改良であり、 部分タグの扱いの関係と Windows-1252 を既定値の既定値とする表の構造上の複雑さによるところが多いです。
[12] 現在の HTML Standard で規定が欠落しているものの大部分は Windows Vista にあって Chrome や Firefox にないロケールです。
[16] いくつか調整が必要と思われるものもあります。
[15]
ジョージア語 (ka) は規定が欠落していますが、
現地の利用状況によれば georgian-ps が妥当と思われます。
[13]
蒙古語 (mn) は規定が欠落していますが、
現地の利用状況によれば MNS 4330 が妥当と思われます。
[14]
ルーマニア語 (ro) は規定が欠落していますが、
当時の複雑な文字コード事情に起因するものでしょうか。
ISO-8859-2 や Windows-1250 が使われたもの、
どちらもルーマニア語の文字との同定に問題を抱えていたことと、
ルーマニア語の文字のビット組合せが両者で異なることが状況を複雑にしています。
ただ、 Windows-1252 よりはそのどちらかを選ぶのが妥当と考えられます。
当時の Chrome に合わせて ISO-8859-2 とするべきでしょうか。
(現在の Chrome の実装は不明。)
[193]
越南語 (vi) で Windows-1258 が選ばれたのは特に理由がなさそうで、
単に Windows の ANSIコードページだからというだけと思われます。
Windows-1258 は Web ではほとんど使われていない文字コードです。