ロケール依存の既定の文字コード

ロケール依存の既定の文字コード (Web)

[1] Webページへの navigate における文字コードの判定には色々な手法が組合せられますが、 いずれとも決めかねるとき最終的に利用者ロケールによって決まる既定値たる文字コードが選ばれます。

仕様書

概要

[2] 利用者ロケール利用者がよく見るWebページ自然言語符号化と相関があると考えられるため、 ロケールが使われると説明されています。 >>177

[4] 歴史的にみれば Web はそれぞれの利用者言語に合わせて地域化される形で普及したものであり、 各地で標準的に使われていた文字コードがその地域のデファクト標準として使われ始めました。 Webブラウザーは動作プラットフォームの既定値、すなわち利用者ロケール文字コードをそのまま標準的に使っていました。 Webにおける文字コード

[5] ロケール設定に依存するということは、ロケール設定が違えば文字化けするということです。 もちろんこれは望ましくないことで、著者HTTP charset<meta charset> で明示的に文字コードの指定を行うべきで、 Webブラウザー頻度解析等の手法により可能な限り高精度な文字コードの判定を行うべきです。 ロケール依存の既定値は、それらでどうしても決定できない場合にだけ使われます。 encoding sniffing algorithm

[6] 現在となってはロケール依存の既定値が使わざるを得ない状況はそう多くはないとはいえ、 古い時代に作られたWebサイトを中心に、未だに必要とされています。 今後急激に増える可能性は皆無ですが、 既存の Webサイト (Internet Archive 所蔵のものを含む。) を後世にそのままの形で伝えていくために Webブラウザーは恒久的にロケール依存の既定値に対応し続けなければなりません。


[3] HTML Standard には、策定当時の主要 Webブラウザーの実装状況に基づき決められたロケールと既定の文字符号化の対応表が示されています。 >>177

[7] この既定値は実装依存かつ利用者の指定が可能と規定されており、 HTML Standard に提示された表が1つの基準となるものの、 それが強制されているわけではありません。

[8] 実装は、実際の Webサイトの実情や他の実装の状況を勘案しつつ、 Web互換性を高めるため改善を続けることが期待されます。

HTML Standard の規定の歴史的変遷

[10] HTML5 (現 HTML Standard) の主要な版の変更点と、 それに関連して当時参照された実装の状況は次の通りです。

l
ロケール
0
HTML5 初期 >>206
a
HTML5 r4126 >>206
vi
Windows Vista >>183
fx
Firefox >>183
ch
Chrome >>183
b
HTML5 r7958 >>183
c
HTML5 r8258 >>184, r8259 >>185
x
HTML Standard >>9
l
af
vi
windows-1252
fx
windows-1252
l
am
vi
windows-1252
ch
windows-1252
l
ar
a
UTF-8
b
windows-1256
vi
windows-1256
ch
windows-1256
x
windows-1256
l
arn-CL
fx
windows-1252
vi
windows-1252
l
az
ch
-
fx
-
vi
windows-1254
x
windows-1254
l
az-Cyrl-AZ
ch
-
fx
-
vi
windows-1251
l
ba
c
windows-1251
x
windows-1251
l
ba-RU
ch
-
fx
-
vi
windows-1251
l
be
a
ISO-8859-5
ch
<none>
vi
windows-1251
fx
ISO-8859-5
c
windows-1251
x
windows-1251
l
be-BY
ch
-
fx
-
vi
windows-1251
l
bg
a
windows-1251
b
windows-1251
ch
windows-1251
fx
windows-1251
vi
windows-1251
x
windows-1251
l
bn
fx
windows-1252
ch
windows-1252
l
br-FR
vi
windows-1252
fx
windows-1252
l
bs-Cyrl-BA
ch
-
fx
-
vi
windows-1251
l
bs-Latn-BA
ch
-
fx
-
vi
windows-1250
l
ca
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
co-FR
vi
windows-1252
fx
windows-1252
l
cs cs-CZ
a
ISO-8859-2
l
cs
b
windows-1250
vi
windows-1250
ch
windows-1250
fx
ISO-8859-2
x
windows-1250
l
cy
a
UTF-8
l
cy-GB
vi
windows-1252
fx
windows-1252
l
da
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
de
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
el
vi
windows-1253
ch
ISO-8859-7
fx
windows-1252 (実際は ISO-8859-7 >>202)
c
ISO-8859-7
x
ISO-8859-7
l
el-GR
ch
-
fx
-
vi
windows-1253
l
en
vi
windows-1252
fx
windows-1252
l
es
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
et
b
windows-1257
vi
windows-1257
ch
windows-1257
x
windows-1257
l
eu
vi
windows-1252
fx
windows-1252
l
fa (fa-IR)
a
UTF-8
l
fa
b
windows-1256
vi
windows-1256
ch
windows-1256
x
windows-1256
l
fi
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
fil
fx
windows-1252
ch
windows-1252
l
fo
vi
windows-1252
fx
windows-1252
l
fr
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
fy-NL
vi
windows-1252
fx
windows-1252
l
ga-IE
vi
windows-1252
fx
windows-1252
l
gl
vi
windows-1252
fx
windows-1252
l
gsw-FR
vi
windows-1252
fx
windows-1252
l
gu
ch
windows-1252
fx
windows-1252
l
ha-Latn-NG
vi
windows-1252
fx
windows-1252
l
he (he-IL)
a
windows-1255
l
he
b
windows-1255
vi
windows-1255
ch
windows-1255
fx
windows-1255
x
windows-1255
l
hi
ch
windows-1252
fx
windows-1252
l
hr
a
UTF-8
b
windows-1250
vi
windows-1250
ch
windows-1250
x
windows-1250
l
hu (hu-HU)
a
ISO-8859-2
l
hu
b
ISO-8859-2
ch
ISO-8859-2
fx
ISO-8859-2
vi
windows-1250
x
ISO-8859-2
l
hu-HU
ch
-
fx
-
vi
windows-1250
l
id
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
ig-NG
vi
windows-1252
fx
windows-1252
l
is
vi
windows-1252
fx
windows-1252
l
it
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
iu-Latn-CA
vi
windows-1252
fx
windows-1252
l
ja (ja, ja-JP-mac)
a
windows-31J (Shift_JIS)
l
ja
b
Shift_JIS
vi
Shift_JIS
ch
Shift_JIS
fx
Shift_JIS
x
Shift_JIS
l
kk
a
UTF-8
ch
-
fx
-
vi
windows-1251
c
windows-1251
x
windows-1251
l
kl-GL
vi
windows-1252
fx
windows-1252
l
kn
ch
windows-1252
fx
windows-1252
l
ko (ko-KR)
a
windows-949 (EUC-KR)
l
ko
b
windows-949
vi
windows-949
ch
windows-949
fx
windows-949
x
EUC-KR
l
ku
a
windows-1254 (ISO-8859-9)
b
windows-1254 (Best guess)
x
windows-1254
l
ky
ch
-
fx
-
vi
windows-1251
c
windows-1251
x
windows-1251
l
lb-LU
vi
windows-1252
fx
windows-1252
l
lt
a
windows-1257
b
windows-1257
vi
windows-1257
fx
windows-1257
ch
windows-1257
x
windows-1257
l
lv (lv-LV)
a
ISO-8859-13
l
lv
b
windows-1257
vi
windows-1257
ch
windows-1257
fx
ISO-8859-13
x
windows-1257
l
mk (mk-MK)
a
UTF-8
l
mk
ch
-
fx
-
vi
windows-1251
c
windows-1251
x
windows-1251
l
ml
ch
windows-1252
fx
windows-1252
l
mn
ch
-
fx
-
vi
windows-1251
l
moh-CA
vi
windows-1252
fx
windows-1252
l
mr
ch
windows-1252
fx
windows-1252
l
ms
vi
windows-1252
fx
windows-1252
l
nb
ch
windows-1252
fx
windows-1252
l
nl
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
nn-NO
vi
windows-1252
fx
windows-1252
l
no
vi
windows-1252
fx
windows-1252
l
nso-ZA
vi
windows-1252
fx
windows-1252
l
oc-FR
vi
windows-1252
fx
windows-1252
l
or
a
UTF-8
l
pl (pl-PL)
a
ISO-8859-2
l
pl
b
ISO-8859-2
ch
ISO-8859-2
fx
ISO-8859-2
vi
windows-1250
x
ISO-8859-2
l
pl-PL
ch
-
fx
-
vi
windows-1250
l
prs-AF
ch
-
fx
-
vi
windows-1256
l
pt
vi
windows-1252
fx
windows-1252
l
qut-GT
vi
windows-1252
fx
windows-1252
l
quz-BO
vi
windows-1252
fx
windows-1252
l
quz-EC
vi
windows-1252
fx
windows-1252
l
quz-PE
vi
windows-1252
fx
windows-1252
l
rm-CH
vi
windows-1252
fx
windows-1252
l
ro
a
UTF-8
vi
windows-1250
ch
ISO-8859-2
fx
<none>
l
ro-RO
ch
-
fx
-
vi
windows-1250
l
ru
a
windows-1251
b
windows-1251
vi
windows-1251
ch
windows-1251
fx
windows-1251
x
windows-1251
l
rw-RW
vi
windows-1252
fx
windows-1252
l
sah
c
windows-1251
x
windows-1251
l
sah-RU
ch
-
fx
-
vi
windows-1251
l
se-FI
vi
windows-1252
fx
windows-1252
l
se-NO
vi
windows-1252
fx
windows-1252
l
se-SE
vi
windows-1252
fx
windows-1252
l
sk
a
windows-1250
vi
windows-1250
ch
windows-1250
fx
windows-1250
x
windows-1250
l
sl
a
ISO-8859-2
b
ISO-8859-2
ch
ISO-8859-2
fx
ISO-8859-2
vi
windows-1250
x
ISO-8859-2
l
sl-SI
ch
-
fx
-
vi
windows-1250
l
sma-NO
vi
windows-1252
fx
windows-1252
l
sma-SE
vi
windows-1252
fx
windows-1252
l
smj-NO
vi
windows-1252
fx
windows-1252
l
smj-SE
vi
windows-1252
fx
windows-1252
l
smn-FI
vi
windows-1252
fx
windows-1252
l
sms-FI
vi
windows-1252
fx
windows-1252
l
sq
ch
-
fx
-
vi
windows-1250
l
sr
a
UTF-8
b
windows-1251
vi
windows-1251
ch
windows-1251
x
windows-1251
l
sr-Latn-BA
vi
windows-1250
ch
-
fx
-
l
sr-Latn-SP
ch
-
fx
-
vi
windows-1250
l
sv
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
sw
vi
windows-1252
ch
windows-1252
fx
windows-1252
l
ta
ch
windows-1252
fx
windows-1252
l
te
ch
windows-1252
fx
windows-1252
l
tg
c
windows-1251
x
windows-1251
l
tg-Cyrl-TJ
vi
windows-1251
ch
-
fx
-
l
th
a
windows-874 (TIS-620)
b
windows-874
fx
windows-874
ch
windows-874
vi
windows-874
x
windows-874
l
tk-TM
ch
-
fx
-
vi
windows-1250
l
tn-ZA
vi
windows-1252
fx
windows-1252
l
tr (tr-TR)
a
windows-1254 (ISO-8859-9)
l
tr
b
windows-1254
vi
windows-1254
ch
windows-1254
fx
windows-1254
x
windows-1254
l
tt
ch
-
fx
-
vi
windows-1251
c
windows-1251
x
windows-1251
l
tzm-Latn-DZ
vi
windows-1252
fx
windows-1252
l
ug-CN
vi
windows-1256
ch
-
fx
-
l
uk
a
windows-1251
b
windows-1251
vi
windows-1251
ch
windows-1251
fx
windows-1251
x
windows-1251
l
ur
ch
-
fx
-
vi
windows-1256
l
uz
ch
-
fx
-
vi
windows-1254
l
uz-Cyrl-UZ
ch
-
fx
-
vi
windows-1251
l
vi
a
UTF-8
b
windows-1258
ch
windows-1258
vi
windows-1258
x
windows-1258
l
wee-DE
vi
windows-1252
fx
windows-1252
l
wen-DE
vi
windows-1252
fx
windows-1252
l
wo-SN
vi
windows-1252
fx
windows-1252
l
xh-ZA
vi
windows-1252
fx
windows-1252
l
yo-NG
vi
windows-1252
fx
windows-1252
l
zh-CN
a
GB18030
b
GB18030
vi
GB18030
ch
GB18030
fx
GB18030
x
GBK
l
zh-HK
ch
-
fx
-
vi
Big5
x
Big5
l
zh-Hans
ch
-
fx
-
vi
GB18030
x
GBK
l
zh-Hant
ch
-
fx
-
vi
Big5
x
Big5
l
zh-MO
ch
-
fx
-
vi
Big5
x
Big5
l
zh-SG
ch
-
fx
-
vi
GB18030
x
GBK
l
zh-TW
a
Big5
b
Big5
vi
Big5
ch
Big5
fx
Big5
x
Big5
l
zu-ZA
vi
windows-1252
fx
windows-1252
l
西洋
0
windows-1252
l
その他
a
windows-1252
b
windows-1252
x
windows-1252

[205] ロケール依存の既定値について、 HTML5 の当初の規定は「主に西洋 (Western) では Windows-1252」というだけしかありませんでした。 >>204

[206] その後 Mozilla 1.9.1 に基づく表に改められました。 >>204

      <tr>
       <td>ar
       <td>UTF-8

      <tr>
       <td>be
       <td>ISO-8859-5

      <tr>
       <td>bg
       <td>windows-1251

      <tr>
       <td>cs<!-- -CZ -->
       <td>ISO-8859-2

      <tr>
       <td>cy
       <td>UTF-8

      <tr>
       <td>fa<!-- -IR -->
       <td>UTF-8

      <tr>
       <td>he<!-- -IL -->
       <td>windows-1255

      <tr>
       <td>hr
       <td>UTF-8

      <tr>
       <td>hu<!-- -HU -->
       <td>ISO-8859-2

      <tr>
       <td>ja <!-- and ja-JP-mac -->
       <td>windows-31J <!-- Shift_JIS -->

      <tr>
       <td>kk
       <td>UTF-8

      <tr>
       <td>ko<!-- -KR -->
       <td>windows-949 <!-- EUC-KR -->

      <tr>
       <td>ku
       <td>windows-1254 <!-- ISO-8859-9 -->

      <tr>
       <td>lt
       <td>windows-1257

      <tr>
       <td>lv<!-- -LV -->
       <td>ISO-8859-13

      <tr>
       <td>mk<!-- -MK -->
       <td>UTF-8

      <tr>
       <td>or
       <td>UTF-8

      <tr>
       <td>pl<!-- -PL -->
       <td>ISO-8859-2

      <tr>
       <td>ro
       <td>UTF-8

      <tr>
       <td>ru
       <td>windows-1251

      <tr>
       <td>sk
       <td>windows-1250

      <tr>
       <td>sl
       <td>ISO-8859-2

      <tr>
       <td>sr
       <td>UTF-8

      <tr>
       <td>th
       <td>windows-874 <!-- TIS-620 -->

      <tr>
       <td>tr<!-- -TR -->
       <td>windows-1254 <!-- ISO-8859-9 -->

      <tr>
       <td>uk
       <td>windows-1251

      <tr>
       <td>vi
       <td>UTF-8

      <tr>
       <td>zh-CN
       <td>GB18030

      <tr>
       <td>zh-TW
       <td>Big5

      <tr>
       <td>All other locales
       <td>windows-1252

[192] 更に当時の Webブラウザーの挙動の調査が進められました。 >>191

Locale     Description     Vista           Chrome          Spec/Firefox
ro         Romanian        windows-1250    ISO-8859-2      windows-1252
cs         Czech           windows-1250    windows-1250    ISO-8859-2
hu         Hungarian       windows-1250    ISO-8859-2      ISO-8859-2
lv         Latvian         windows-1257    windows-1257    ISO-8859-13
sl         Slovenian       windows-1250    ISO-8859-2      ISO-8859-2
pl         Polish          windows-1250    ISO-8859-2      ISO-8859-2
be         Belarusian      windows-1251    <none>          ISO-8859-5
el         Greek           windows-1253    ISO-8859-7      windows-1252

[196] その結果を踏まえた変更の経緯が HTML5 に注釈として記録されています。 >>195

      <!-- af, Afrikaans, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- am, Amharic, uses windows-1252: Firefox and Chrome agreed -->

      <tr>
       <td>ar
       <td>Arabic
       <td>windows-1256 <!-- Windows Vista and Chrome agreed -->

      <!-- arn-CL, Mapudungun (Chile), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- az, Azeri, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1254 -->

      <!-- az-Cyrl-AZ, Azeri (Cyrillic, Azerbaijan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 -->

      <!-- be-BY, Belarusian (Belarus), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <tr>
       <td>bg
       <td>Bulgarian
       <td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- bn, Bengali, uses windows-1252: Firefox and Chrome agreed -->

      <!-- br-FR, Breton (France), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- bs-Cyrl-BA, Bosnian (Cyrillic, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- bs-Latn-BA, Bosnian (Latin, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- ca, Catalan, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- co-FR, Corsican (France), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>cs
       <td>Czech
       <td>windows-1250 <!-- Windows Vista and Chrome agreed (but disagreed with Firefox, which thought the encoding should be ISO-8859-2) -->

      <!-- cy-GB, Welsh (United Kingdom), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- da, Danish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- de, German, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- el, Greek, is not listed here because Windows Vista wanted windows-1253, Chrome wanted ISO-8859-7, and Firefox wanted windows-1252 -->

      <!-- el-GR, Greek (Greece), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1253 -->

      <!-- en, English, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- es, Spanish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <tr>
       <td>et
       <td>Estonian
       <td>windows-1257 <!-- Windows Vista and Chrome agreed -->

      <!-- eu, Basque, uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>fa
       <td>Persian
       <td>windows-1256 <!-- Windows Vista and Chrome agreed -->

      <!-- fi, Finnish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- fil, Filipino, uses windows-1252: Firefox and Chrome agreed -->

      <!-- fo, Faroese, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- fr, French, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- fy-NL, Frisian (Netherlands), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- ga-IE, Irish (Ireland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- gl, Galician, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- gsw-FR, Alsatian (France), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- gu, Gujarati, uses windows-1252: Firefox and Chrome agreed -->

      <!-- ha-Latn-NG, Hausa (Latin, Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>he
       <td>Hebrew
       <td>windows-1255 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- hi, Hindi, uses windows-1252: Firefox and Chrome agreed -->

      <tr>
       <td>hr
       <td>Croatian
       <td>windows-1250 <!-- Windows Vista and Chrome agreed -->

      <tr>
       <td>hu
       <td>Hungarian
       <td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->

      <!-- hu-HU, Hungarian (Hungary), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- id, Indonesian, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- ig-NG, Igbo (Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- is, Icelandic, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- it, Italian, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- iu-Latn-CA, Inuktitut (Latin, Canada), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>ja
       <td>Japanese
       <td>Shift_JIS <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- kk, Kazakh, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- kl-GL, Greenlandic (Greenland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- kn, Kannada, uses windows-1252: Firefox and Chrome agreed -->

      <tr>
       <td>ko
       <td>Korean
       <td>windows-949 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <tr>
       <td>ku
       <td>Kurdish
       <td>windows-1254 <!-- Best guess -->

      <!-- ky, Kyrgyz, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- lb-LU, Luxembourgish (Luxembourg), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>lt
       <td>Lithuanian
       <td>windows-1257 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <tr>
       <td>lv
       <td>Latvian
       <td>windows-1257 <!-- Windows Vista and Chrome agreed (but disagreed with Firefox, which thought the encoding should be ISO-8859-13) -->

      <!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- ml, Malayalam, uses windows-1252: Firefox and Chrome agreed -->

      <!-- mn, Mongolian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- moh-CA, Mohawk (Mohawk), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- mr, Marathi, uses windows-1252: Firefox and Chrome agreed -->

      <!-- ms, Malay, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- nb, Norwegian Bokm&aring;l, uses windows-1252: Firefox and Chrome agreed -->

      <!-- nl, Dutch, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- nn-NO, Norwegian, Nynorsk (Norway), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- no, Norwegian, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- nso-ZA, Sesotho sa Leboa (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- oc-FR, Occitan (France), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>pl
       <td>Polish
       <td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->

      <!-- pl-PL, Polish (Poland), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- prs-AF, Dari (Afghanistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->

      <!-- pt, Portuguese, uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- qut-GT, K'iche (Guatemala), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- quz-BO, Quechua (Bolivia), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- quz-EC, Quechua (Ecuador), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- quz-PE, Quechua (Peru), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- rm-CH, Romansh (Switzerland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- ro, Romanian, is not listed here because Windows Vista wanted windows-1250, Chrome wanted ISO-8859-2, and Firefox wanted <none> -->

      <!-- ro-RO, Romanian (Romania), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <tr>
       <td>ru
       <td>Russian
       <td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- rw-RW, Kinyarwanda (Rwanda), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- se-FI, Sami, Northern (Finland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- se-NO, Sami, Northern (Norway), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- se-SE, Sami, Northern (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>sk
       <td>Slovak
       <td>windows-1250 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <tr>
       <td>sl
       <td>Slovenian
       <td>ISO-8859-2 <!-- Chrome and Firefox agreed (but disagreed with Windows Vista, which thought the encoding should be windows-1250) -->

      <!-- sl-SI, Slovenian (Slovenia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- sma-NO, Sami, Southern (Norway), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- sma-SE, Sami, Southern (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- smj-NO, Sami, Lule (Norway), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- smj-SE, Sami, Lule (Sweden), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- smn-FI, Sami, Inari (Finland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- sms-FI, Sami, Skolt (Finland), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- sq, Albanian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <tr>
       <td>sr
       <td>Serbian
       <td>windows-1251 <!-- Windows Vista and Chrome agreed -->

      <!-- sr-Latn-BA, Serbian (Latin, Bosnia and Herzegovina), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- sr-Latn-SP, Serbian (Latin, Serbia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- sv, Swedish, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- sw, Kiswahili, uses windows-1252: Windows Vista, Chrome, and Firefox agreed -->

      <!-- ta, Tamil, uses windows-1252: Firefox and Chrome agreed -->

      <!-- te, Telugu, uses windows-1252: Firefox and Chrome agreed -->

      <!-- tg-Cyrl-TJ, Tajik (Cyrillic, Tajikistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <tr>
       <td>th
       <td>Thai
       <td>windows-874 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- tk-TM, Turkmen (Turkmenistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1250 -->

      <!-- tn-ZA, Setswana (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>tr
       <td>Turkish
       <td>windows-1254 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <!-- tzm-Latn-DZ, Tamazight (Latin, Algeria), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- ug-CN, Uighur (PRC), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->

      <tr>
       <td>uk
       <td>Ukrainian
       <td>windows-1251 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- ur, Urdu, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1256 -->

      <!-- uz, Uzbek, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1254 -->

      <!-- uz-Cyrl-UZ, Uzbek (Cyrillic, Uzbekistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->

      <tr>
       <td>vi
       <td>Vietnamese
       <td>windows-1258 <!-- Windows Vista and Chrome agreed -->

      <!-- wee-DE, Lower Sorbian (Germany), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- wen-DE, Upper Sorbian (Germany), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- wo-SN, Wolof (Senegal), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- xh-ZA, isiXhosa (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->

      <!-- yo-NG, Yoruba (Nigeria), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td>zh-CN
       <td>Chinese (People's Republic of China)
       <td>GB18030 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- zh-HK, Chinese (Hong Kong S.A.R.), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->

      <!-- zh-Hans, Chinese (Simplified), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted GB18030 -->

      <!-- zh-Hant, Chinese (Traditional), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->

      <!-- zh-MO, Chinese (Macao S.A.R.), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted Big5 -->

      <!-- zh-SG, Chinese (Singapore), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted GB18030 -->

      <tr>
       <td>zh-TW
       <td>Chinese (Taiwan)
       <td>Big5 <!-- Windows Vista, Chrome, and Firefox agreed -->

      <!-- zu-ZA, isiZulu (South Africa), uses windows-1252: Windows Vista and Firefox agreed -->

      <tr>
       <td colspan=2>All other locales
       <td>windows-1252

[199] >>198 の変更分

      <!-- ba wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
      <tr>
       <td>ba
       <td>Bashkir
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- be, Belarusian, was not initially listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>be
       <td>Belarusian
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- kk, Kazakh, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>kk
       <td>Kazakh
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- ky, Kyrgyz, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>ky
       <td>Kyrgyz
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- mk, Macedonian, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>mk
       <td>Macedonian
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- sah wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
      <tr>
       <td>sah
       <td>Yakut
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- tg wasn't listed at all because none of the sources knew about it. However, further feedback has changed this: -->
      <tr>
       <td>tg
       <td>Tajik
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->
      <!-- tt, Tatar, was not initially listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>tt
       <td>Tatar
       <td>windows-1251 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23089 -->

[203] >>201 >>202

      <!-- el, Greek, was not initially listed here because Windows Vista wanted windows-1253, Chrome wanted ISO-8859-7, and Firefox wanted ISO-8859-7 but looked liked it wanted windows-1252 -->
      <!-- further feedback has changed this: -->
      <tr>
       <td>el
       <td>Greek
       <td>ISO-8859-7 <!-- per https://www.w3.org/Bugs/Public/show_bug.cgi?id=23090 -->

[194] 既定値が存在するにも関わらず一定していないのは、当該市場でいずれかの Webブラウザーがそれほど影響力を持っていなかったのか、 文字化けに遭遇するたびに文字コード指定メニューで手動で切り替えていたのか、 既定値まで判断がもつれ込むことが多くなかったのか、 など考えられ、検討が必要です。 挙げられた国の多くが中欧東欧文字コードの混乱があった地域で、 自動判定が失敗しがちな似たような文字コードであることには注意したいです。

[186] Attachment #8336745 for bug #910211 ( ( 版)) https://bugzilla.mozilla.org/attachment.cgi?id=8336745&action=diff#a/dom/encoding/domainsfallbacks.properties_sec2


[9] 現在の HTML Standard の規定 >>177:

Locale language	Suggested default encoding
ar	Arabic	windows-1256
az	Azeri	windows-1254
ba	Bashkir	windows-1251
be	Belarusian	windows-1251
bg	Bulgarian	windows-1251
cs	Czech	windows-1250
el	Greek	ISO-8859-7
et	Estonian	windows-1257
fa	Persian	windows-1256
he	Hebrew	windows-1255
hr	Croatian	windows-1250
hu	Hungarian	ISO-8859-2
ja	Japanese	Shift_JIS
kk	Kazakh	windows-1251
ko	Korean	EUC-KR
ku	Kurdish	windows-1254
ky	Kyrgyz	windows-1251
lt	Lithuanian	windows-1257
lv	Latvian	windows-1257
mk	Macedonian	windows-1251
pl	Polish	ISO-8859-2
ru	Russian	windows-1251
sah	Yakut	windows-1251
sk	Slovak	windows-1250
sl	Slovenian	ISO-8859-2
sr	Serbian	windows-1251
tg	Tajik	windows-1251
th	Thai	windows-874
tr	Turkish	windows-1254
tt	Tatar	windows-1251
uk	Ukrainian	windows-1251
vi	Vietnamese	windows-1258
zh-Hans, zh-CN, zh-SG	Chinese, Simplified	GBK
zh-Hant, zh-HK, zh-MO, zh-TW	Chinese, Traditional	Big5
All other locales	windows-1252

[11] 改正の過程と実装との差異は複雑に見えますが、 基本的には実態に合わせるための改良であり、 部分タグの扱いの関係と Windows-1252 を既定値の既定値とする表の構造上の複雑さによるところが多いです。

[12] 現在の HTML Standard で規定が欠落しているものの大部分は Windows Vista にあって ChromeFirefox にないロケールです。

[13] 蒙古語 (mn) は規定が欠落していますが、 現地の利用状況によれば Windows-1251 (厳密には MNS 4330) が妥当と思われます。

[14] ルーマニア語 (ro) は規定が欠落していますが、 当時の複雑な文字コード事情に起因するものでしょうか。 ISO-8859-2Windows-1250 が使われたもの、 どちらもルーマニア語の文字との同定に問題を抱えていたことと、 ルーマニア語の文字のビット組合せが両者で異なることが状況を複雑にしています。 ただ、 Windows-1252 よりはそのどちらかを選ぶのが妥当と考えられます。 当時の Chrome に合わせて ISO-8859-2 とするべきでしょうか。 (現在の Chrome の実装は不明。)

[193] 越南語 (vi) で Windows-1258 が選ばれたのは特に理由がなさそうで、 単に WindowsANSIコードページだからというだけと思われます。 Windows-1258Web ではほとんど使われていない文字コードです。 Windows-1258 Web で使われた越南語の文字コードの主要な3つのいずれかが好ましいでしょうが、 いずれか1つとも選びかねるところであり、いっそフォント依存符号化と互換性がある Windows-1252 にした方がましという考え方もあります。 越南語の文字コード

メモ