charset (文字コード)

[55] charset は、文字コードを表す短い識別子 (によって表される文字コード) です。

仕様書

[915] RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content (2014-06-07 01:55:45 +09:00 版) https://tools.ietf.org/html/rfc7231#section-3.1.1.2
[77] RFC 2295 - Transparent Content Negotiation in HTTP (2014-08-31 19:36:42 +09:00 版) http://tools.ietf.org/html/rfc2295#section-5.4
[88] HTML Standard (2015-01-16 08:47:54 +09:00 版) https://html.spec.whatwg.org/#attr-meta-charset

呼称

[56] charset は、「ちゃーせっと」、「きゃらせっと」などと発音されるようです。

定義

A charset is a method of mapping a sequence of octets to a sequence of abstract characters. A charset is, in effect, a combination of one or more CCSs with a CES. Charset names are registered by the IANA according to procedures documented in [RFC2278]. <NONE>
Many protocol definitions use the term "character set" in their descriptions. The terms "charset" or "character encoding scheme" are strongly preferred over the term "character set" because "character set" has other definitions in other contexts and this can be confusing.

RFC 3536 - Terminology Used in Internationalization in the IETF (2011-01-29 02:14:52 +09:00 版) http://tools.ietf.org/html/rfc3536#page-6

構文

[916] HTTP においては、 charset名は字句で、大文字・小文字不区別とされています >>846。

quoted-string

[116] MIME/HTTP で使われる媒体型の charset 引数の指定方法では、引数値は他の引数同様に value です。つまり、 token または quoted-string を1つ使って指定できます。

例:

(すべて等価)

[117] WinIE6 は引用符があるものに対応していないみたいです。

簡単な傍証:

charset=iso-8859-1 の実体を表示
次のどちらか:
- charset=iso-2022-jp の実体を表示
- charset="iso-2022-jp" の実体を表示

引用符をつけていると見事に化けます。

もちろんこのような杜撰な実装は規格不適合です。

文脈

[75] MIME では、 RFC 2231 で拡張された引数の値の構文で登録された IANA charset 名を使うことができるとされています。

[157] 他に次のプロトコル要素が「charset」を名乗っています。

ほわっといず charset?

char は文字 character の略として業界(どこ)では頻用される語句です。 set はその通りセットですが、数学の set と同じ(ような)意味で、正統的(謎)には集合と訳します。

ですから、そのまま解釈すると文字集合ということになります。

しかし、この辺の用語の混乱は激しくて、大抵は単なる文字の集合のことだけではなくて、その文字の集合に数字を割り振ったものとか、その数字と計算機上の表現の対応の定義とか、そういう (文字を計算機で扱うのに必要な) 余計な色々までひっくるめて、 charset と呼びます。

MIME charset

MIME が定義する charset が、一番有名でしょう。

RFC 2045 〜 2049 には charset という語の定義は出てきませんが、 "charset" parameter は文字集合 character set を示すパラメーターらしいので、 character set = charset と考えて良いと思われます、

文字集合 Character Set (RFC 2045 2.2)

   The term "character set" is used in MIME to refer to a method of
   converting a sequence of octets into a sequence of characters.  Note
   that unconditional and unambiguous conversion in the other direction
   is not required, in that not all characters may be representable by a
   given character set and a character set may provide more than one
   sequence of octets to represent a particular sequence of characters.

用語「character set」「文字集合」は MIME ではオクテット列を文字列に変換する方法を表します。なお、逆方向への無条件かつ曖昧でない変換は必須ではありません。全ての文字が当該文字集合で表現可能でないかもしれませんし、その文字集合で特定の文字列を表現するのに複数のオクテット列が使えても構いません。

   This definition is intended to allow various kinds of character
   encodings, from simple single-table mappings such as US-ASCII to
   complex table switching methods such as those that use ISO 2022's
   techniques, to be used as character sets.  However, the definition
   associated with a MIME character set name must fully specify the
   mapping to be performed.  In particular, use of external profiling
   information to determine the exact mapping is not permitted.

この定義は、 US-ASCII のように簡単な単一表対応付けから ISO 2022 技術を使うものまで、様々な種類の文字符号化を文字集合として使うことを認めるものです。しかし、 MIME 文字集合名に関連付けられる定義は使われる対応付けを完全に規定するものでなければなりません。特に、正確な対応付けに外部プロファイル情報を使うのは認められません。

   NOTE: The term "character set" was originally to describe such
   straightforward schemes as US-ASCII and ISO-8859-1 which have a
   simple one-to-one mapping from single octets to single characters.
   Multi-octet coded character sets and switching techniques make the
   situation more complex. For example, some communities use the term
   "character encoding" for what MIME calls a "character set", while
   using the phrase "coded character set" to denote an abstract mapping
   from integers (not octets) to characters.

参考: 用語「character set」「文字集合」は元々は US-ASCII や ISO-8859-1 のように単一オクテットと単一文字の一対一の単純な対応付けを持つ分かりやすい方式を指していました。多オクテット符号化文字集合や切り替え技術のおかげで状況が複雑になりました。例えば、用語「文字符号化 character encoding」を MIME で言うところの「文字集合 character set」に使い、語句「符号化文字集合 coded character set」で整数 (オクテットで無しに) から文字への抽象的な対応付けを示す世間もあります。 (訳注: CES/CCS 論のことらしい。 Ned じーさんは、 CCS/CES 論は charset の1種ではあるが charset はそれに限定されないとおっしゃってます。)

[66] Re: Media types for RDF languages N3 and Turtle (Sean B. Palmer 著, 2007-12-17 15:51:29 +09:00 版) http://permalink.gmane.org/gmane.ietf.types/953

IANA charset

[103] IANA 文字符号化登録簿 (Character Sets registry) は、文字符号化 (charset) の名前に関する IANA による IETF の登録簿 (IANAREG) です。

[917] charset名は、 RFC 2978 により IANA に登録するべき (ought to) です >>846。

[104] CHARACTER SETS http://www.iana.org/assignments/character-sets

charset 値の一覧

[194] MIME charset / IANA charset

`text/*` の `charset` 引数

[885] MIME型 text/* の charset 引数については、 text/* を参照してください。

仕様書

意味

[127] text/plain の charset 引数は、文字集合 (character set) を指定するものです >>265。

[129] 他の text/* MIME型も、この charset 引数を使うと規定できます >>265。その意味は、 text/plain の場合と同じとするべきです >>265。

[134] その他の MIME型も、この charset 引数を使うと規定できます >>265。テキスト情報が主に想定されていますが、それ以外でも使うことができます >>265。

[135] 規定されていない MIME型では charset 引数を使うことはできません。

構文

[133] RFC 2046 で規定された値の他、 IANA登録簿にも値を登録できます >>265。

[58] IANA登録簿に登録される text/* MIME型の charset 引数は、値として RFC 2978 charset を指定するものでなければなりません >>57。

[143] インターネットメールでは、 RFC 2046 で規定された値、 IANA登録簿の値、 x- で始まる値しか使うことはできません >>265。

[130] text/plain 以外の text/* MIME型では、値を制限することもできます >>265。

[128] 値は、大文字・小文字不区別です >>265。

[59] 実際の MIME型は必ずしもこの規定に従っていません。例えば text/html や text/css の charset 引数は IANA charset ではなく Encoding Standard の符号化ラベルとして解釈されます。

[131] MIME では、 text/* の改行の規則に従わない文字集合を使うことはできません >>265。

[132] UTF-16 や EBCDIC を使うことはできません。

[136] text/* 以外の MIME型で charset 引数が適用される場合は、 MIME でもそのような文字集合を使えます >>265。

文脈

[142] 文字コードが US-ASCII 以外なら、常に明示的に指定しなければなりません >>265。

[60] charset 引数は、 charset 情報が payload 内部に含まれる場合は、指定するべきではありません >>57。

[61] 例えば text/xml >>57 で encoding 疑似属性を指定する時は、 charset 引数を指定するべきではありません。

[62] 両方を指定するべきではないというだけで、内部の指定を使うべきとは言っていないようです。

[152] multipart/form-data の本体部分の text/plain では、使っても構いませんが、一般的な実装は使いません >>153。

既定値

[137] charset 引数は明示することが強く推奨されています >>265。

[63] charset 引数を規定する時は、必須の引数として規定するべきです。必須としない強い理由がある時は既定値を規定しても構いませんし、既定値がないと規定しても構いません。既定値は UTF-8 とするべきです。 >>57

[64] 必須とするべきなのは既定値を指定する必要がないから >>57 とされています。必須とするか否かに関わらず、既定値を定義しないと相互運用性はどうやって確保するのでしょうかね??

[65] いずれにせよ text/* を IANA登録簿に登録する際は charset の決定方法を明確に規定しなければなりません >>57。

[264] RFC 2046 (およびそれ以前の MIME RFC) によれば text/* の charset 引数の既定値は US-ASCII でした >>265。

[267] RFC 2616 (およびそれ以前の HTTP RFC) はこの既定値を (HTTP においては) ISO-8859-1 に変更しました >>266。

[268] RFC 6657 は、そのどちらも現状に合っていないとして、 RFC 2046 を更新して次のように規定しています >>262, >>57。

[269] MIME型の登録は次のいずれかとするべきです。
- [270] charset 情報はデータ本体に含まれているため、 MIME型としては charset 引数は使わない。よって既定値もない。
- [271] charset 引数は必須とし、既定値不要とする。
[272] どうしても charset 既定値を定義したい場合、 UTF-8 とするべきです。
[273] どう charset を決定するか規定しなければなりません。
[274] MIME を使うプロトコルは既定値を上書きしてはなりません。

[275] charset 引数を必須にするのはプロトコルや実装によってはあまり現実的ではないと思いますが・・・。

[276] HTTP の規定を上書きするものではないみたいです。

[215] その後出版された RFC 7231 は >>267 の既定値を削除しています >>214。

[154] multipart/form-data の本体部分の text/plain では、フォームの提出の文字コードを使って構わない >>151 ことになっている一方、 charset 引数は強制されていません >>153。

処理

[145] 一般に MIME の生成ソフトウェアは、可能な限り「最小公倍数」たる文字集合を使うべきです >>265 (charset最小化)。

[146] 例えば ISO-8859-X の左半分しか使わないなら、 US-ASCII を使うべきです >>265。

[147] しかし現在となってはむしろ常に UTF-8 を使う方が相互運用性は高いかもしれません。

[102] MIME では未知の text/* MIME型は text/plain として扱うことになっています。 charset 引数がこの際どう扱われるかは不明です。

歴史

XML の `charset` 引数

[101] XML MIME型でも charset 引数があります。詳細は規定されていませんが (ひどい)、MIME charset を指定するもので、 XML encoding 疑似属性より優先されるものと解されています。

[923] 詳しくはXMLにおける文字コードを参照してください。

文字集合

charset は省略形のように見えます。元々はそうなのでしょうが、今となっては charset は charset であって charset でしかない、と考えるのが適当かもしれません。訳にあたっても、訳して原文のニュアンスが失われるとアレなので (ニュアンスなんてそもそも残らないかもしれませんけど。) そのまま charset とするのが良いのではないでしょうか。

正規形のように見える character set は、文字集合と訳すのが定着してます。こちらの語は、 (やはり charset と同様の混乱はあるものの) より本来の意味 (文字の集合) で使われていると思われます。

[1] なんにせよ、文字コードのところに挙げられている charset の類義語は、その文脈ごとに意味が異なるくらいに思っていないと要らぬ勘違いをしてしまいます。
[2] MIME charset の最小化規則と、 HTTP CGI のような動的生成って本質的に相容れないもののような気がしませんかね。あるいは streaming 的なものとも。 chunked符号化の尾っぽ header を使えば何とかならなくもない気もするけど, クライアント側で届いたところからレンダリングが不能になって結局意味がない (サーバー側でデータ生成完了後に charset を判定して一気に送りつけるのと変わらない) し、よって steraming には使いようがないし。
[3] >>2 の解法は最小化規則をあきらめちゃうしかないのかな。 CGI 動的生成はまだ最大の範囲を知ってそうなものだけど、 streaming だと知らないこともあり得る (多言語会談とか?) から、考えうる最大の charset (UTF-8 とか?) を予め仮定しておくしかなさそう。
[4] >>3 の考え方はつまり charset 指定の最大化であって、 MIME の考えとは全く逆になってしまう。
[5] >>4 そもそも MIME の最小化規則は相互通信性最大化を目的としている。 SMTP/822 では基本的にやり直しが効かないから、一度で相手に伝える必要がある (概念上は)。一方 HTTP とかだと内容折衝もあるし、 (歴史的経緯のせいで) ブラウザの利用者の操作で文字コード選択が出来るから (メイラにもあるけど、 MIME の思想的には考慮外だと思う。) 一度で伝える必要性ってのはあんまりないのかもしれない。こういう考え方の違いがにじみ出てる気がする。
[6] ところで MIME 厨うざい。 MIME (IANA) charset 名だけが文字コードの名前ではないわけで。色んな文字コード名がこれ以上乱立するのは鬱だから IANA 名を使おうとするのは別に構わないんだが。
[7] >>6 構わないんだが、だからといって IANA 名と違うから間違ってる!直せ!ってのはもうアホかと。
[8] >>6-7 既に乱立してる名前はもうなくせないから、せめて新しい名前を作らない努力はするべきですね。それと IANA 名の強制は話が別。大体、 IANA registry はぐちゃぐちゃだし登録されてない文字コードも多いし、互換性とかと関係ない場面で採用するのがいいとは思えない。
[9] >>8 とはいうものの IANA 名以外にまともな registry はないよな。やっぱり現実には何でもあり, IANA 名推奨。に落ち着くのかなあ。
[10] >>6-9 なんだが、 IANA 名が正式な名前であってその他の名前を標準名にするなとかうるさい厨がいるんだよな〜。 MIME に於いては IANA 名のうち MIME preferred name が推奨されるとか、常識的に考えて Alias 名より Name の欄にある名前が「正式」なんだろうとか、 IETF/IANA の世界では IANA 名が「正式」なんだろうとか思われるけど、一端そういう世界を出てしまったら、「正式」もなにもないと思うんだけどな〜
[11] >>10 つまり、ここに挙がっていることやその他の, 名前選択の長短についての検討を含めて、こういう名前採用すべし、っていう意見ならどんどん勝手に言えばいいんだけど、これが IANA で決まってるから正式なんだ、と非 IETF 世界で, それ以上の根拠を述べずに主張しても、うざいだけ。
[12] 最近妙に普及している charset 名に none があります。 Apache の設定ファイルで AddDefaultCharset 指令を使って AddDefaultCharset none と書くべきであるという不思議な知識が蔓延しているからです。 (たぶん AddDefaultCharset off と書きたかったのでしょう。もっとも、 Apache 付属の説明文すら読めない人が AddDefaultCharset off を正しく使えるとは思えませんが。 AddDefaultCharset 指令は初心者が使う機能ではありませんよ。)
[13] 一応補足しますが、 >>12 は HTTP の Content-Type 欄での媒体型指定において使います。

[180] HTTP::Message - search.cpan.org (2017-09-10 18:28:51 +09:00) http://search.cpan.org/~oalders/HTTP-Message-6.13/lib/HTTP/Message.pm は charset の値 none を無変換として解釈します。

HTTP における `charset` 引数

[213] HTTP は MIME の charset の仕組みを流用していますが、細かい部分で色々と違いがあります。

そしてなんといっても問題なのが、 MIME 以上に HTTP では charset の規定が無視されています。

そのために更に HTTP 内外に屋上屋を架けまくって種々の規格とその相互関係が泥沼になっています。

[214] MIME との最も重要な差異は、 Content-Type 欄の charset 引数が省略された時の既定値が、 MIME の US-ASCII ではなく ISO-8859-1 であることです。

ただ、その規定の適用範囲が曖昧で困ります。 (すべての text/* に適用されるのか、 MIME の text/plain の定義と同じものに限るのか。 text/plain と同じ定義なら非 text にも適用するのか。) (そもそもこういう問題が生じるのは、欄の引数ではなく媒体型の引数であるはずの charset の定義をプロトコルが変えてしまったためです。 (そしてその元凶は、 MIME が text/* や charset を特別扱いしているせいです。。。))

RFC 1945 (HTTP/1.0); RFC 2068・2616 (HTTP/1.1) 3.4 Character Sets

HTTP uses the same definition of the term "character set" as that described for MIME:

HTTP は、用語「文字集合」を、 MIME で説明されているのと同じ定義で使用します :

The term "character set" is used in this document to refer to a method used with one or more tables to convert a sequence of octets into a sequence of characters. Note that unconditional conversion in the other direction is not required, in that not all characters may be available in a given character set and a character set may provide more than one sequence of octets to represent a particular character. This definition is intended to allow various kinds of character encodings, from simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ~~ISO 2022's~~ ISO-2022's {2616} techniques. However, the definition associated with a MIME character set name ~~{1945} must~~ MUST fully specify the mapping to be performed from octets to characters. In particular, use of external profiling information to determine the exact mapping is not permitted.

用語「文字集合」は、この文書では、オクテットの列を文字の列に変換する一つ以上の表とともに使用する方式を参照するために使います。逆方向への無条件の変換は必須ではなく、すべての文字が与えられた文字集合で利用可能ではないかもしれず、複数のオクテット列がある一つの特定の文字を示すかもしれないことに注意してください。この定義は、 US-ASCII のような単純な単一表写像から ISO 2022 の技術を使った複雑な表切り替え方式まで種々の文字符号化を認めることを意図しています。しかし、 MIME 文字集合名と関連付けられる定義はオクテットから文字への写像を完全に規定していなければなりません。特に、正確な写像の決定のために必要な外部プロファイル情報の使用は認めていません。

Note: This use of the term "character set" is more commonly referred to as a "character encoding." However, since HTTP and MIME share the same registry, it is important that the terminology also be shared.

注意 : 用語「文字集合」のこの用法は、より一般的には「文字符号化」と呼ばれます。しかし、 HTTP と MIME は同じ登録簿を共有していますから、用語も共有することが重要です。

HTTP character sets are identified by case-insensitive tokens. The complete set of tokens ~~{1945} are~~ is defined by the IANA Character Set registry ~~{1945} [15]~~ . {1945} However, because that registry does not define a single, consistent token for each character set, we define here the preferred names for those character sets most likely to be used with HTTP entities. These character sets include those registered by RFC 1521 [5] -- the US-ASCII [17] and ISO-8859 [18] character sets -- and other names specifically recommended for use within MIME charset parameters.]]

HTTP 文字集合は大文字・小文字を区別しない字句で識別します。字句の完全な集合は IANA 文字集合登録簿で定義します。しかし、この登録簿は各文字集合に単一の一貫した字句を定義してはいませんから、ここでそうした文字集合で HTTP 実体に使用するのに最も好ましい名前を定義します。この文字集合には、 RFC1521 により登録された US-ASCII 文字集合や ISO8859 文字集合や、 MIME charset 引数で使用することが特に推奨される他の名前を含みます。

{1945}
charset = "US-ASCII" | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3" | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6" | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9" | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR" | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8" | token

{2068,2616} charset = token

Although HTTP allows an arbitrary token to be used as a charset value, any token that has a predefined value within the IANA Character Set registry ~~{1945} [15] must~~ MUST represent the character set defined by that registry. Applications ~~{1945} should~~ SHOULD limit their use of character sets to those defined by the IANA registry [19]].

HTTP は charset 値として任意の字句を使用することを認めていますが、 IANA 文字集合長禄簿で定義されている値はその登録簿で定義されている文字集合を表現しなければなりません。応用は、その使用する文字集合を IANA 登録簿で定義されたものに制限するべきです。

{1945} The character set of an entity body should be labelled as the lowest common denominator of the character codes used within that body, with the exception that no label is preferred over the labels US-ASCII or ISO-8859-1.
実体本体の文字集合はその実体で使用している文字符号の最小公倍数で札付けするべきです。但し札 US-ASCII や札 ISO-8859-1 を他よりも優先します。

{2616} Implementors should be aware of IETF character set requirements [38] [41].

実装者は、 IETF 文字集合要件に注意されたい。

{errata} HTTP uses charset in two contexts: within an Accept-Charset request header (in which the charset value is an unquoted token) and as the value of a parameter in a Content-type header (within a request or response), in which case the parameter value of the charset parameter may be quoted.

HTTP は charset を2つの文脈で使用します。 Accept-Charset 要求頭中で (charset 値は非引用字句。) および (要求または応答の中の) Content-Type 頭の引数の値としてで、 charset 引数の引数値としての場合は引用しても構いません。

RFC 2616 3.4.1 Missing Charset

Some HTTP/1.0 software has interpreted a Content-Type header without charset parameter incorrectly to mean "recipient should guess." Senders wishing to defeat this behavior MAY include a charset parameter even when the charset is ISO-8859-1 and SHOULD do so when it is known that it will not confuse the recipient.

HTTP/1.0 ソフトウェアの中には charset 引数なしの Content-Type 頭を誤って「受信者は推測するべし」を意味すると解釈するものがあります。この動作を撃破したいと思う送信者は、 charset が ISO-8859-1 であっても charset 引数を含めても構いませんし、そうすることによって受信者を混乱させないだろうと分かっているのであれば、そうするべきです。

Unfortunately, some older HTTP/1.0 clients did not deal properly with an explicit charset parameter. HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset, rather than the recipient's preference, when initially displaying a document. See section 3.7.1.

不幸にも、古い HTTP/1.0 クライアントの中には陽に示した charset 引数を適切に処理しないものがありました。 HTTP/1.1 受信者は送信者が提供した charset 札を尊重しなければなりません。そして charset を「推測」する用意のある利用者エージェントは、 content-type 欄の charset に対応していれば、最初の文書を表示する時にはその受信者の設定でではなく、その charset を使わなければなりません。

[216] WinIE 6.0 は、たとえばある URI 参照 u の頁を表示したときに、その charset が iso-8859-1 で、表示直後にアドレス欄で Enter を押したら今度は charset が iso-2022-jp になっていたとき、 iso-8859-1 として解釈しようとして化けます。

再読込ボタンではこの症状は起きません。 (名無しさん 2004-05-03 04:43:16 +00:00)

CGI における `charset` 引数

CGI 要求における既定値

[861] CGI のメタ変数 CONTENT_TYPE においては、既定値が次の優先度で定義されています。

MIME型依存でシステム定義の既定値を定めても構いません。
text/* については RFC 2616 に従い ISO-8859-1 です。
MIME型で既定値が定義されていれば、それです。
US-ASCII です。

[862] RFC 3875 - The Common Gateway Interface (CGI) Version 1.1 (2011-11-20 06:09:05 +09:00 版) http://tools.ietf.org/html/rfc3875#page-13

[863] HTTP との実質的な違いは第1位のシステム定義の既定値が認められていることです。 CGI はシステムの慣習によって鯖からCGIスクリプトに値を引き渡す前に諸々の変換を施すことが認められているので、 charset についても同様の取り扱いを行えるように、との配慮なのでしょうか。

CGI 応答における既定値

[864] 特にシステム定義がなければ、クライアントによって text/* の既定の charset が HTTP なら ISO-8859-1、そうでなければ US-ASCII とみなされます。そのためCGIスクリプトは charset 引数を含めるべきです。 >>865

[865] RFC 3875 - The Common Gateway Interface (CGI) Version 1.1 (2011-11-20 06:09:05 +09:00 版) http://tools.ietf.org/html/rfc3875#section-6.3.1

[866] 明記されていませんが、システム定義があれば実装は charset の変換を適宜行うことになるでしょうし、 charset が指定されていなかったり不適切だったりすると適宜書き換えるものと思われます。

EBCDIC/POSIX

[868] EBCDIC を使う POSIX 環境では、既定の charset は IBM1047 です。これは text/* とその他実装定義の MIME型に適用されます。 >>867

[867] RFC 3875 - The Common Gateway Interface (CGI) Version 1.1 (2011-11-20 06:09:05 +09:00 版) http://tools.ietf.org/html/rfc3875#section-7.3

`meta` 要素 `charset` 属性 (HTML)

[89] meta 要素の charset 属性は、文書の文字符号化を指定する文字符号化宣言です >>88。

属性値

[90] 属性値は、 UTF-8 にASCII大文字・小文字不区別で一致する値でなければなりません >>88。

[183] かつてはこの規定はXML文書にのみ適用されていました。 XML文書では <meta charset> は意味を持たず、 HTML と XML との相互変換の便宜のためのみを目的に認められていました >>88。

[91] 2017年10月の改訂でHTML文書は UTF-8 でなければならないと改められたため、本規定は HTML文書にも適用されることとなりました。

[224] Welcome to Microsoft Canada's Homepage, 2025-11-27T00:58:47.000Z https://web.archive.org/web/20010610151919id_/http://www.microsoft.com/canada/

[PRE

      <title>Welcome to Microsoft Canada's Homepage</title>
      <meta name='description' content="The entry page to Microsoft Canada's Web site. Find software">
      <meta name='keywords' content="products; headlines; downloads; news; Web site; what is new;">
      <meta name='MS.LOCALE' content='EN'>
      <meta http-equiv='Content-Type' content='text/html; charset=code page'>
	  <body topmargin="0" leftmargin="0">
]PRE]

文脈

[92] <meta charset> 要素は、メタデータ内容として使えます。

[93] <meta charset> 要素は、文書中に複数あってはなりません >>88。

[184] 歴史的には HTTPヘッダー Content-Type: の charset 引数に文字コードを指定するのが正統と考えられていたことから、現在に至るまで charset 引数が指定される場合には、 <meta charset> を指定する必要はないとされています。

歴史

[223] iCab日本語版問題点, 2025-11-26T09:40:11.000Z, 2003-02-04T01:13:14.047Z https://web.archive.org/web/20030204011035/http://www1.harenet.ne.jp/~take-jun/iCab/problem.html

`@charset` 規則 (CSS)

[54] CSS の@規則 @charset は、当該 CSSスタイルシートの文字コードを表します。

[53] @charset を参照。

異体説明における `charset` 属性

[78] 異体説明では charset 異体属性は Content-Type: ヘッダーの charset 引数に相当します >>77。

[79] type 異体属性から charset 引数は除外します。

構文

[80] 属性値は、 charset です >>77。

[81] RFC 2068 の定義が参照されています。

`charset` (SDP)

[208] RFC 4566: SDP: Session Description Protocol, 2023-12-02T07:30:22.000Z, 2023-12-02T12:13:30.622Z https://www.rfc-editor.org/rfc/rfc4566.html#page-28
[209] RFC Errata Report » RFC Editor, 2023-12-02T12:15:36.000Z https://www.rfc-editor.org/errata/rfc4566

CAP における charset

[211] RFC 4324: Calendar Access Protocol (CAP), 2023-12-02T07:30:13.000Z, 2023-12-02T13:35:06.153Z https://www.rfc-editor.org/rfc/rfc4324.html#section-8.11

[210] <greeting localize> も参照。

範囲

[918] HTTP の Accept-Charset: ヘッダーや HTML の accept-charset 属性では、 charset のリストを指定することができます。更に HTTP のヘッダーでは、優先順位を指定したり、その他を表す「*」を指定したりできます。

charset の一覧

[138] MIME は、インターネットメールで使う単一の文字集合があるのが好ましいながら、近い将来のうちに統一は望めないとして、少数だけ標準の文字集合を定義する >>265 としていました。

[139] MIME が開発された90年代には、 Unicode への統一は困難とも思われていました。

[140] MIME は次のものを規定しています >>265。

[141] これは比較的論争とならなかったもののみであり、 US-ASCII 以外の特定の文字集合を推奨するものではない >>265 とされています。 MIME の時代、文字コードの選択は宗教的な問題を含む大きな関心事でした。

[144] 実装者は、どうしても必要な場合を除き、新しい文字集合を定義するべきではありません (discouraged) >>265。

[150] 次のような値がありました。

ISO-8859-1
ISO-2022-JP
ISO-2022-JP-1
ISO-2022-JP-2
ISO-2022-JP-3
ISO-2022-CN
ISO-2022-CN-EXT
ISO-2022-KR
EUC-JP
EUC-KR
EUC-TW
Shift_JIS
Big5
Big5-HKSCS
unknown-8bit
IDontUseDefaultCharsetsAndIKnowWhatImDoingAndParanoidsKnowWhatIamDoing

[212] Character Set Recognition, InetSDK, 2024-08-17T04:06:35.000Z, 2000-12-02T08:57:31.151Z https://web.archive.org/web/20001202085600/http://msdn.microsoft.com/workshop/Author/dhtml/reference/charsets/charset4.asp

`charset` 引数が使われている MIME 型とその定義のバリエーション

MIME の `charset` 引数と互換な定義

[14]

- text/plain (平文)
  - ~~RFC 1341~~, ~~RFC 1521~~, RFC 2046
- text/html (HTML)
  - HTML4 以前の定義
- text/csv (CSV)
  - RFC 4180
  - text/* で使えるものは使えるという定義。
  - 省略可能。
- application/sgml-open-catalog (SGML型録)
- text/troff
  - RFC 4263
  - RFC 2046 を参照。
  - 省略可能。
- [830] text/owl-functional、text/owl-manchester
  - OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (2009-10-27 23:15:00 +09:00 版) http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#Appendix:_Internet_Media_Type.2C_File_Extension.2C_and_Macintosh_File_Type
  - OWL 2 Web Ontology Language Manchester Syntax (2009-10-27 23:14:56 +09:00 版) http://www.w3.org/TR/2009/NOTE-owl2-manchester-syntax-20091027/#Appendix:_Internet_Media_Type.2C_File_Extension_and_Macintosh_File_Type
  - 規定が曖昧ながら、 MIME の定義を上書きする意図はないように見えます。
  - ただし MIME を参照しているわけではありません。
  - UTF-8 を推奨。
- [831] application/owl+xml
  - OWL 2 Web Ontology Language XML Serialization (2009-10-27 23:15:01 +09:00 版) http://www.w3.org/TR/2009/REC-owl2-xml-serialization-20091027/#Appendix:_Internet_Media_Type.2C_File_Extension.2C_and_Macintosh_File_Type
  - 規定が曖昧ながら、 MIME の定義を採用しているように見えます。
  - ただし MIME を参照しているわけではありません。
  - RFC 3023 は参照していません。
[83] application/news-groupinfo
- RFC 5537 - Netnews Architecture and Protocols (2014-09-14 17:08:11 +09:00 版) http://tools.ietf.org/html/rfc5537#section-4.2
- text/plain と同じ
- 省略可能
- 既定値は US-ASCII
[84] application/news-checkgroups
- RFC 5537 - Netnews Architecture and Protocols (2014-09-14 17:08:11 +09:00 版) http://tools.ietf.org/html/rfc5537#section-4.3
- 省略可能
- 既定値は US-ASCII
[149] application/index.response
- 「通常の MIME の意味」
- 省略可能
- [148] RFC 2652 - MIME Object Definitions for the Common Indexing Protocol (CIP) (2015-05-03 22:21:07 +09:00 版) https://tools.ietf.org/html/rfc2652#section-2.2
[167] text/markdown
- [168] RFC 7763 - The text/markdown Media Type (2016-03-28 02:57:21 +09:00 版) https://tools.ietf.org/html/rfc7763#section-2
- [169] RFC 6838 を参照。
- 必須
[205] SGML MIME型
- [206] RFC 1874 - SGML Media Types, 2023-08-04T13:21:14.000Z https://datatracker.ietf.org/doc/html/rfc1874#section-2.3
- [207] 非 SGML 対応ソフトウェア用 SGML MIME型

MIME の `charset` 引数の部分集合な定義

[838]

[888] application/shf+xml (SHF)
- RFC 4194
- 値は UTF-8 のみ。
- 必須。
[889] text/provenance-notation
- PROV-N: The Provenance Notation (2013-04-25 04:01:26 +09:00 版) http://www.w3.org/TR/2013/REC-prov-n-20130430/#media-type
- 値は UTF-8 のみ。
- 必須。
[72] text/n3, text/rdf+n3, application/n3
- Notation3 (N3): A readable RDF syntax (2008-01-16 03:12:43 +09:00 版) http://www.w3.org/TeamSubmission/2008/SUBM-n3-20080114/#charset
- 値は utf-8 のみ。
- ASCII の場合のみ、省略可能。
[875] text/owl-manchester
- OWL 2 Web Ontology Language Manchester Syntax (2012-10-18 22:45:56 +09:00 版) http://www.w3.org/TR/2012/NOTE-owl2-manchester-syntax-20121018/#Appendix:_Internet_Media_Type.2C_File_Extension_and_Macintosh_File_Type
- 省略可能。
- 値は UTF-8 のみ。
[876] text/ping
- HTML Standard (2012-10-19 23:02:58 +09:00 版) http://www.whatwg.org/specs/web-apps/current-work/#text/ping
- Web Applications 1.0 r7468 Update the IANA registration templates to allow charset= parameters so that when we change to the new registration mechanism for MIME types, we don't forget to allow these.]] (2012-10-18 08:53:00 +09:00 版) http://html5.org/tools/web-apps-tracker?from=7467&to=7468
- 省略可能。
- 値は utf-8 のみ。
[877] text/event-stream
- HTML Standard (2012-10-19 23:02:58 +09:00 版) http://www.whatwg.org/specs/web-apps/current-work/#text/event-stream
- 省略可能。
- 値は utf-8 のみ。
[878] text/cache-manifest
- HTML Standard (2012-10-19 23:02:58 +09:00 版) http://www.whatwg.org/specs/web-apps/current-work/#text/cache-manifest
- Web Applications 1.0 r7468 Update the IANA registration templates to allow charset= parameters so that when we change to the new registration mechanism for MIME types, we don't forget to allow these.]] (2012-10-18 08:53:00 +09:00 版) http://html5.org/tools/web-apps-tracker?from=7467&to=7468
- 省略可能。
- 値は utf-8 のみ。

Turtle

- [73] text/turtle, application/x-turtle
  - Turtle - Terse RDF Triple Language (2008-01-16 02:03:24 +09:00 版) http://www.w3.org/TeamSubmission/2008/SUBM-turtle-20080114/#sec-mime
  - Turtle - Terse RDF Triple Language (2008-01-16 02:03:24 +09:00 版) http://www.w3.org/TeamSubmission/2008/SUBM-turtle-20080114/#sec-mediaReg
  - 値は UTF-8 のみ。
  - text/* で charset 引数なしで UTF-8 が使えるようになるまでは、 charset 引数を明示することを推奨。
  - ASCII の場合のみ、省略可能。

[873] R2RML: RDB to RDF Mapping Language (2012-09-27 00:23:35 +09:00 版) http://www.w3.org/TR/2012/REC-r2rml-20120927/#syntax

[874] R2RML である Turtle について: charset 引数を使うべきです >>873。

[903] text/turtle
- [902] RDF 1.1 Turtle (2014-03-07 08:53:19 +09:00 版) https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-turtle/index.html#h2_sec-mediaReg

[904] >>903 によれば値は UTF-8 のみ。非ASCII文字を含むなら必須。

[850] text/event-stream
- 値は utf-8 のみ。
- 省略可能。
- http://www.whatwg.org/specs/web-apps/current-work/complete.html#text/event-stream
[851] text/html, text/html-sandboxed (HTML MIME型)
- 優先MIME名でなければならないなどいくつか制約あり。
- http://www.whatwg.org/specs/web-apps/current-work/complete.html#text/html
- http://www.whatwg.org/specs/web-apps/current-work/complete.html#text/html-sandboxed

RFC 3023 の `charset` と同じ定義

[840]

[841] application/rif+xml
- RIF Core Dialect (2010-06-22 23:52:50 +09:00 版) http://www.w3.org/TR/2010/REC-rif-core-20100622/#Appendix:_RIF_Media_Type_Registration
- RFC 3023 を参照。
- 省略可能。
[879] RFC 6787 - Media Resource Control Protocol Version 2 (MRCPv2) (2012-11-13 15:18:40 +09:00 版) http://tools.ietf.org/html/rfc6787#section-13.2.1
[919] RFC 4662 - A Session Initiation Protocol (SIP) Event Notification Extension for Resource Lists (2014-06-15 09:36:33 +09:00 版) http://tools.ietf.org/html/rfc4662#section-8.2
- RFC 3023 を参照。
- 省略可能。

RFC 3023 `application/xml` と同じ定義

[839]

[920] application/xml-external-parsed-entity
- RFC 3023
- 省略可能
[922] application/xml-dtd
- RFC 3023
- 省略可能

- [15] application/cdl+xml
  - Web Services Choreography Description Language Version 1.0 http://www.w3.org/TR/2005/CR-ws-cdl-10-20051109/#Mime-Type-definition
  - 省略可能。
- [16] application/ccxml+xml
  - RFC 4267
  - 省略可能。
- [17] application/voicexml+xml
  - RFC 4267
  - 省略可能。
- [18] application/srgs+xml
  - RFC 4267
  - 省略可能。
- [19] application/ssml+xml
  - RFC 4267
  - RFC 4267
  - 省略可能。
- [20] application/pls+xml
  - RFC 4267
  - 省略可能。
- [20] application/atom+xml
  - RFC 4287 7.
  - 省略可能。
- [823] application/atomcat+xml
  - RFC 5023
  - 省略可能。
- [824] application/atomsvc+xml
  - RFC 5023
  - 省略可能。
- [883] application/atomdeleted+xml
  - [884] RFC 6721 - The Atom "deleted-entry" Element (2012-12-31 01:08:20 +09:00 版) http://tools.ietf.org/html/rfc6721#section-8
  - 省略可能。
- [21] application/sparql-result+xml
  - SPARQL Query Results XML Format http://www.w3.org/TR/2006/CR-rdf-sparql-XMLres-20060406/#mime-form
  - SPARQL Query Results XML Format (Second Edition) (2013-03-21 20:27:51 +09:00 版) http://www.w3.org/TR/2013/REC-rdf-sparql-XMLres-20130321/#mime-form
  - 省略可能。
- [22] application/xv+xml
  - RFC 4374
  - 省略可能。
  - 旧 I-D では application/xhtml-voice+xml
- [23] application/smil+xml, application/smil
  - RFC 4536
  - 省略可能。
- [26] application/simple-filter+xml
  - RFC 4661
  - 省略可能。
- [29] application/vnd.sun.wadl+xml
  - 省略可能。
- [31] application/mediaservercontrol+xml
  - RFC 4722
  - 省略可能
- [33] application/docbook+xml
  - DocBook 仕様書
  - 省略可能
- [35] application/xslt+xml
  - XSL Transformations (XSLT) Version 2.0 http://www.w3.org/TR/2007/REC-xslt20-20070123/#media-type-registration
  - 省略可能
- [36] application/xquery+xml
  - XML Syntax for XQuery 1.0 (XQueryX) http://www.w3.org/TR/2007/REC-xqueryx-20070123/#xqueryx-mime-registration
  - XQueryX 3.0 (2014-04-08 08:22:50 +09:00 版) http://www.w3.org/TR/xqueryx-3/#xqueryx-mime-registration
  - XQueryX 3.1 (2017-03-20 09:32:28 +09:00) https://www.w3.org/TR/2017/REC-xqueryx-31-20170321/#xqueryx-mime-registration
  - 省略可能
- [37] application/cpl+xml
  - RFC 3880
  - 省略可能
- [44] application/resource-lists+xml
  - RFC 4826
  - 省略可能
- [45] appliaction/rls-services+xml
  - RFC 4826
  - 省略可能
- [46] application/wsdl+xml
  - Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language http://www.w3.org/TR/2007/REC-wsdl20-20070626/#ietf-reg
  - 省略可能
- [50] application/wspolicy+xml
  - Web Services Policy 1.5 - Framework http://www.w3.org/TR/2007/REC-ws-policy-20070904/#ietf-reg
  - 省略可能
- [67] application/sparql-result+xml
  - SPARQL Query Results XML Format http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/#mime-form
  - 省略可能
[70] application/patch-ops-error+xml
- RFC 5261 (IETF 提案標準) urn:ietf:rfc:5261
- 省略可能
- RFC 5261 - An Extensible Markup Language (XML) Patch Operations Framework Utilizing XML Path Language (XPath) Selectors (2013-08-11 16:56:51 +09:00 版) http://tools.ietf.org/html/rfc5261#section-10.2
[837] application/xproc+xml
- XProc: An XML Pipeline Language (2010-05-11 22:38:07 +09:00 版) http://www.w3.org/TR/2010/REC-xproc-20100511/#media-type-registration
- XProc 2.0: An XML Pipeline Language (2016-07-21 14:35:49 +09:00) https://www.w3.org/TR/2016/NOTE-xproc20-20160721/#xproc-media-type
- 省略可能
[827] application/emma+xml
- EMMA: Extensible MultiModal Annotation markup language http://www.w3.org/TR/2009/REC-emma-20090210/#media-type-registration
- 省略可能
[845] application/xquery+xml
- XML Syntax for XQuery 1.0 (XQueryX) (Second Edition) http://www.w3.org/TR/2010/REC-xqueryx-20101214/#xqueryx-mime-registration
- 省略可能
[847] application/tei+xml
- RFC 6129 - The \x27application/tei+xml\x27 Media Type (2011-04-21 05:04:28 +09:00 版) http://tools.ietf.org/html/rfc6129#section-5.1
- 省略可能
[852] application/xhtml+xml
- http://www.whatwg.org/specs/web-apps/current-work/complete.html#application/xhtml+xml
- 省略可能
[853] application/metalink4+xml
- RFC 5854 - The Metalink Download Description Format (2011-07-03 17:23:02 +09:00 版) http://tools.ietf.org/html/rfc5854#section-6.2
- 省略可能
[854] application/inkml+xml
- Ink Markup Language (InkML) (2011-09-20 17:16:49 +09:00 版) http://www.w3.org/TR/2011/REC-InkML-20110920/#media-type-registration
- 省略可能
[858] application/evd+xml
- Web Services Event Descriptions (WS-EventDescriptions) (2011-12-13 20:09:52 +09:00 版) http://www.w3.org/TR/2011/REC-ws-event-descriptions-20111213/#EVD_MIME
- 省略可能
[886] application/xenc+xml
- XML Encryption Syntax and Processing Version 1.1 (2013-04-13 02:02:32 +09:00 版) http://www.w3.org/TR/2013/REC-xmlenc-core1-20130411/#sec-MediaType-Registration
- 省略可能
[887] application/provenance+xml
- PROV-XML: The PROV XML Schema (2013-04-30 13:05:52 +09:00 版) http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/#media-type
- 省略可能
[892] application/xop+xml
- XML-binary Optimized Packaging (2005-01-24 23:00:54 +09:00 版) http://www.w3.org/TR/xop10/#id2270207
- 省略可能
[896] message/imdn+xml
- RFC 5438 - Instant Message Disposition Notification (IMDN) (2013-10-06 10:18:10 +09:00 版) http://tools.ietf.org/html/rfc5438#section-15.1
- 省略可能
[901] application/mathml+xml, application/mathml-presentation+xml, application/mathml-content+xml
- 省略可能
- Mathematical Markup Language (MathML) Version 3.0 2nd Edition -- single page HTML + MathML Version (2014-02-10 20:00:21 +09:00 版) http://www.w3.org/Math/draft-spec/mathml.html#appendixb_media-types-reg
[909] application/xacml+xml
- 省略可能
- RFC 7061 - eXtensible Access Control Markup Language (XACML) XML Media Type (2014-03-02 21:45:15 +09:00 版) http://tools.ietf.org/html/rfc7061#section-2.1
[912] application/emotionml+xml
- 省略可能
- Emotion Markup Language (EmotionML) 1.0 (2014-05-20 20:02:30 +09:00 版) http://www.w3.org/TR/emotionml/#media-type-registration
[914] application/its+xml
- 省略可能
- Internationalization Tag Set (ITS) Version 2.0 (2013-10-27 19:39:43 +09:00 版) http://www.w3.org/TR/its20/#its-mime-type
[76] application/alps+xml (ALPS)
- 省略可能
- draft-amundsen-richardson-foster-alps-00 - Application-Level Profile Semantics (ALPS) (2014-10-16 14:34:48 +09:00 版) https://tools.ietf.org/html/draft-amundsen-richardson-foster-alps-00#section-4.1
[82] application/metalink4+xml (Metalink)
- 省略可能
- RFC 5854 - The Metalink Download Description Format (2014-09-14 16:54:14 +09:00 版) http://tools.ietf.org/html/rfc5854#section-6.2
[452] application/cmisquery+xml, application/cmisallowableactions+xml, application/cmistree+xml, application/cmisatom+xml, application/cmisacl+xml
- 省略可能
- OASIS Specification Template (2010-04-13 02:41:48 +09:00 版) http://docs.oasis-open.org/cmis/CMIS/v1.0/cs01/cmis-spec-v1.0.html#_Toc235259822
[155] application/scxml+xml
- 省略可能
- State Chart XML (SCXML): State Machine Notation for Control Abstraction (2015-09-01 05:30:17 +09:00 版) http://www.w3.org/TR/scxml/#media-type-registration
application/davmount+xml
- 省略可能

RFC 7303 `application/xml` の `charset` 引数と同じ定義

[160] application/rfc+xml
[175] application/gml+xml
- [176] (2017-01-10 14:01:06 +09:00) https://www.iana.org/assignments/media-types/application/gml+xml
  - 「See Section 3 of RFC 7303.」

RFC 3023 `application/xml` の `charset` 引数の部分集合な定義

[27]

- [28] application/cellml+xml (CellML)
  - RFC 4642 (IETF 情報提供 RFC) urn:ietf:rfc:4642
  - 省略可能。
  - UTF-8 だけが妥当な値
  - 利用者エージェントは他の値に対応して構いません

RFC 3023 の `text/xml` の `charset` と同じ定義

[40]

[921] text/xml-external-parsed-entity
- RFC 3023
- 省略可能

- [41] text/cmml
  - The Continuous Media Markup Language (CMML), Version 2.1 (2006-05-06 10:39:46 +09:00 版) (満期 (~~IETF I-D~~)) http://annodex.net/TR/cmml.html#rfc.section.10
[24] text/javascript, text/ecmascript, application/javascript, application/ecmascript
- RFC 4329
- 省略可能。
[32] application/beep (BEEP)
- RFC 3080 (IETF 提案標準) urn:ietf:rfc:3080
- 既定値が UTF-8 であることと、 UTF-8 以外を用いる時は charset 引数が必要なことだけ規定されています。

RFC 3023 を参照した定義

[199] application/pidf+xml
- [198] rfc3863, 2021-06-10T01:46:50.000Z https://datatracker.ietf.org/doc/html/rfc3863#section-5.1
- [200] charset 引数の説明は独自。
- [201] charset 引数の既定値は UTF-8。
- [202] application/pidf+xml は RFC 3023 application/xml の特殊形だとの説明あり。

TTML

[172] application/ttml+xml には charset 引数があります。省略可能です >>170。

[171] XML の符号化宣言と同じ、または符号化宣言が無い場合には実際の符号化 >>170 とされています。符号化宣言と同じなら実際の符号化と異なっていても良いのか、と思ってしまいますが、 TTML の符号化は UTF-8 と UTF-16 に制限されているようですから、そちらに違反してしまいます。なお、「実際の符号化」は、どう指定するのか明記されていません。自明だと思ったのかもしれませんが、大文字と小文字が区別されるのかどうか、 UTF-16BE のような指定も認められるのか、など不明です。

[173] TTML の処理で本引数をどのように用いるべきなのかは規定がありません。符号化に関して RFC 3023 も参照されているので、エスパーするなら、 XML MIME型と同じ方法と推測するべきでしょうか。 UTF-8 と UTF-16 に限定されている上 XML の符号化宣言があるので、本質的には本引数は不要です。

[170] TTML Media Type Definition and Profile Registry (2016-05-07 03:08:58 +09:00) https://w3c.github.io/tt-profile-registry/#mediatype

OWL

SQL

[891] application/sql (SQL)
- RFC 6922 - The application/sql Media Type (2013-08-04 17:13:41 +09:00 版) http://tools.ietf.org/html/rfc6922#section-3

`UTF-8` のみ指定できるもの

[162]

`charset` を定義しないと明記しているもの

[906] application/json (>>305) では charset 引数がしばしば用いられていますが、 RFC 7159 は charset 引数がないと明記しています。

[905] RFC 7159 - The JavaScript Object Notation (JSON) Data Interchange Format (2014-03-07 18:11:43 +09:00 版) http://tools.ietf.org/html/rfc7159

[100] 現実については application/json を参照。

歴史

RFC 1345

[124] 後の IANA charset の元となったものを定義していたのが RFC 1345 でした。

2002-09-23 (Mon) 11:42:47 名無しさん : RFC 1345 の charset 表の前半の ISO-IR 典拠のは、かなりが嘘 charset。
2002-09-23 (Mon) 11:43:22 名無しさん : 補助文字集合があたかも単独で使われるような書き方がしてある。
2002-09-23 (Mon) 11:44:21 名無しさん : 十数個の charset はこのかた一度もつかわれてないに違いない。
2002-09-23 (Mon) 11:46:07 名無しさん : 作った当時の ietf-822ext の記事読んでみたけど、まあ仕方無かったのかなぁ。
2002-09-23 (Mon) 11:46:45 名無しさん : あと、表の g0esc みたいな項目は、単なる参考情報で charset 自体には関係ない。
2002-09-23 (Mon) 11:47:20 名無しさん : 1345 charsets はすべて平面上の文字集合。
2002-09-23 (Mon) 11:48:42 名無しさん : 4つの多バイト文字集合は制御文字が定義されてない。 (説明もそう言ってる。でもって表には G0 で示してある、と。)
2002-09-23 (Mon) 11:50:06 名無しさん : そのままでは使い物にならないことを半ば承知の上で定義したんじゃないかと思う。 ietf-822ext の過去の記事を読んだ上でそう思った。

MIME

[105] RFC 2046

A critical parameter that may be specified in the Content-Type field for "text/plain" data is the character set. This is specified with a "charset" parameter, as in:

"text/plain" データの Content-Type 領域に指定しても良いパラメーターは文字集合です、これは "charset" パラメーターで次のように指定します。

     Content-type: text/plain; charset=iso-8859-1

Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.

他の幾つかのパラメーター値とは違って、 charset パラメーターの値は大文字・小文字を区別しません。 charset パラメーターが無い場合に仮定しなければならない既定の文字集合は、 US-ASCII です。

   The specification for any future subtypes of "text" must specify
   whether or not they will also utilize a "charset" parameter, and may
   possibly restrict its values as well.  For other subtypes of "text"
   than "text/plain", the semantics of the "charset" parameter should be
   defined to be identical to those specified here for "text/plain",
   i.e., the body consists entirely of characters in the given charset.
   In particular, definers of future "text" subtypes should pay close
   attention to the implications of multioctet character sets for their
   subtype definitions.

将来の "text" の亜型の仕様は "charset" パラメーターを利用するかどうか規定しなければなりません。また、その値を制限しても構いません。 "text/plain" 以外の "text" 亜型には、 "charset" パラメーターの意味はここで "text/plain" 用に既定するのと同じ様に定義するべきです。つまり、本文は完全に与えた charset で構成されます。特に、将来の "text" 亜型の定義者は多オクテット文字集合とその亜型定義との関係についてしっかり注意するべきです。

   The charset parameter for subtypes of "text" gives a name of a
   character set, as "character set" is defined in RFC 2045.  The rules
   regarding line breaks detailed in the previous section must also be
   observed -- a character set whose definition does not conform to
   these rules cannot be used in a MIME "text" subtype.

"text" 亜型の charset パラメーターは文字集合の名前を与えます。ここで「文字集合 character set」は RFC 2045 で定義したものです。前の節で詳しく述べた改行に関する規則にも注意して下さい。この規則に適合しない文字集合は MIME "text" 亜型で使うことは出来ません。

An initial list of predefined character set names can be found at the end of this section. Additional character sets may be registered with IANA.

予め定義した文字集合名はこの節の終わりにあります。追加の文字集合を IANA で登録しても構いません。

Other media types than subtypes of "text" might choose to employ the charset parameter as defined here, but with the CRLF/line break restriction removed. Therefore, all character sets that conform to the general definition of "character set" in RFC 2045 can be registered for MIME use.

"text" の亜型以外の媒体型もここで定義した charset パラメーターを使うことにしても構いませんが、 CRLF/改行制限は削除されます。ですから、 RFC 2045 の「文字集合 character set」の定義に適合する全ての文字集合を MIME で使用するのに登録出来ます。

   Note that if the specified character set includes 8-bit characters
   and such characters are used in the body, a Content-Transfer-Encoding
   header field and a corresponding encoding on the data are required in
   order to transmit the body via some mail transfer protocols, such as
   SMTP [RFC-821].

なお、指定文字集合が8ビット文字を含んでいてそのような文字が本文で使われている場合、 Content-Transfer-Encoding 頭領域と対応するデータの符号化が本文を SMTP のような幾つかのメイル転送プロトコルで転送するために施す必要があります。

   The default character set, US-ASCII, has been the subject of some
   confusion and ambiguity in the past.  Not only were there some
   ambiguities in the definition, there have been wide variations in
   practice.  In order to eliminate such ambiguity and variations in the
   future, it is strongly recommended that new user agents explicitly
   specify a character set as a media type parameter in the Content-Type
   header field. "US-ASCII" does not indicate an arbitrary 7-bit
   character set, but specifies that all octets in the body must be
   interpreted as characters according to the US-ASCII character set.
   National and application-oriented versions of ISO 646 [ISO-646] are
   usually NOT identical to US-ASCII, and in that case their use in
   Internet mail is explicitly discouraged.  The omission of the ISO 646
   character set from this document is deliberate in this regard.  The
   character set name of "US-ASCII" explicitly refers to the character
   set defined in ANSI X3.4-1986 [US- ASCII].  The new international
   reference version (IRV) of the 1991 edition of ISO 646 is identical
   to US-ASCII.  The character set name "ASCII" is reserved and must not
   be used for any purpose.

既定の文字集合 US-ASCII は過去より混乱と曖昧がありました。定義に曖昧性があるだけではなく、慣習上多様な変種があります。将来このような曖昧性と変種を取り除くため、新しい利用者代理者は明示的に文字集合を Content-Type 頭領域の媒体型パラメーターとして指定することを強く推奨します。 "US-ASCII" は任意の7ビット文字集合を示すのではなく、本文の全てのオクテットを US-ASCII 文字集合によって文字として解釈しなければならないと指定します。国家・応用指向の ISO 646 の版は一般に US-ASCII と同一ではなく、この場合 Internet メイルでの利用は明白に非推奨です。 ISO 646 文字集合をこの文書から省いたのは、このためわざとそうしたのです。 "US-ASCII" の名前の文字集合は明白に、 ANSI X3.4-1986 で定義された文字集合を参照します。 ISO 646 の 1991 年版の新しい国際基準版 (IRV) は US-ASCII と同一です。文字集合名 "ASCII" は保留され、どんな目的にも使ってはいけません。

   NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
   version of the American Standard.  Insofar as one of the purposes of
   specifying a media type and character set is to permit the receiver
   to unambiguously determine how the sender intended the coded message
   to be interpreted, assuming anything other than "strict ASCII" as the
   default would risk unintentional and incompatible changes to the
   semantics of messages now being transmitted.  This also implies that
   messages containing characters coded according to other versions of
   ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
   procedures (e.g., those of ISO 2022), as well as 8bit or multiple
   octet character encodings MUST use an appropriate character set
   specification to be consistent with MIME.

   The complete US-ASCII character set is listed in ANSI X3.4- 1986.
   Note that the control characters including DEL (0-31, 127) have no
   defined meaning in apart from the combination CRLF (US-ASCII values
   13 and 10) indicating a new line.  Two of the characters have de
   facto meanings in wide use: FF (12) often means "start subsequent
   text on the beginning of a new page"; and TAB or HT (9) often (though
   not always) means "move the cursor to the next available column after
   the current position where the column number is a multiple of 8
   (counting the first column as column 0)."  Aside from these
   conventions, any use of the control characters or DEL in a body must
   either occur

    (1)   because a subtype of text other than "plain"
          specifically assigns some additional meaning, or

    (2)   within the context of a private agreement between the
          sender and recipient. Such private agreements are
          discouraged and should be replaced by the other
          capabilities of this document.

   NOTE: An enormous proliferation of character sets exist beyond US-
   ASCII.  A large number of partially or totally overlapping character
   sets is NOT a good thing.  A SINGLE character set that can be used
   universally for representing all of the world's languages in Internet
   mail would be preferrable.  Unfortunately, existing practice in
   several communities seems to point to the continued use of multiple
   character sets in the near future.  A small number of standard
   character sets are, therefore, defined for Internet use in this
   document.

The defined charset values are:

定義されている charset 値は次の通りです。

    (1)   US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].

(1) US-ASCII ANSI X3.4-1986 で定義されたもの。

    (2)   ISO-8859-X -- where "X" is to be replaced, as
          necessary, for the parts of ISO-8859 [ISO-8859].  Note
          that the ISO 646 character sets have deliberately been
          omitted in favor of their 8859 replacements, which are
          the designated character sets for Internet mail.  As of
          the publication of this document, the legitimate values
          for "X" are the digits 1 through 10.

(2) ISO-8859-X ここで "X" は ISO-8859 の部分で置き換えたもの。なお、 ISO 646 の文字集合達は代わりの Internet メイルの指示文字集合である 8859 があるので故意に省いています。この文書の出版の時点では、 "X" の適当な値は数字 1 から 10 です。

Characters in the range 128-159 has no assigned meaning in ISO-8859-X. Characters with values below 128 in ISO-8859-X have the same assigned meaning as they do in US-ASCII.

範囲 128〜159 の文字は ISO-8859-X で割り当てられた意味はありません。 ISO-8859-X で128以下の値は US-ASCII で割り当てられたのと同じ意味を持ちます。

   Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
   alphabet) includes both characters for which the normal writing
   direction is right to left and characters for which it is left to
   right, but do not define a canonical ordering method for representing
   bi-directional text.  The charset values "ISO-8859-6" and "ISO-8859-
   8", however, specify that the visual method is used [RFC-1556].

   All of these character sets are used as pure 7bit or 8bit sets
   without any shift or escape functions.  The meaning of shift and
   escape sequences in these character sets is not defined.

   The character sets specified above are the ones that were relatively
   uncontroversial during the drafting of MIME.  This document does not
   endorse the use of any particular character set other than US-ASCII,
   and recognizes that the future evolution of world character sets
   remains unclear.

Note that the character set used, if anything other than US- ASCII, must always be explicitly specified in the Content-Type field.

なお、 US-ASCII 以外の文字集合が使われている時は、必ず Content-Type 領域に明示しなければなりません。

   No character set name other than those defined above may be used in
   Internet mail without the publication of a formal specification and
   its registration with IANA, or by private agreement, in which case
   the character set name must begin with "X-".

Implementors are discouraged from defining new character sets unless absolutely necessary.

実装者が新しい文字集合を定義するのは完全に必要でない限り非推奨です。

   The "charset" parameter has been defined primarily for the purpose of
   textual data, and is described in this section for that reason.
   However, it is conceivable that non-textual data might also wish to
   specify a charset value for some purpose, in which case the same
   syntax and values should be used.

   In general, composition software should always use the "lowest common
   denominator" character set possible.  For example, if a body contains
   only US-ASCII characters, it SHOULD be marked as being in the US-
   ASCII character set, not ISO-8859-1, which, like all the ISO-8859
   family of character sets, is a superset of US-ASCII.  More generally,
   if a widely-used character set is a subset of another character set,
   and a body contains only characters in the widely-used subset, it
   should be labelled as being in that subset.  This will increase the
   chances that the recipient will be able to view the resulting entity
   correctly.

[125] charset に関する議論は ietf-charset メーリングリストで行われています。

[126] 実在する charset であってもこのメーリングリストの議論で登録を断念したものがいくつもあるようです。

RFC 1922

[106] RFC 1922 は charset 引数と併用するものとして charset-edition 引数と charset-extension 引数を定義していました。

[107] RFC 1922 4. Two New MIME parameters

Here we define two new MIME parameters to be used with "charset" parameters.

ここに、 "charset" パラメーターと共に使う、 2つの新しい MIME パラメーターを定義します。

4.1. "charset-edition"

   This parameter is used after the MIME "charset" parameter, using four
   digits (AD) to indicate what the year of edition is for the character
   set standard shown in "charset".  Its use is optional.
   Implementations should ignore this parameter unless the
   implementation has specific support for that particular character set
   edition.

   The reason for defining this parameter is that there are often
   differences in the defined characters between editions of a character
   set standard.  Sometimes, the difference can not be ignored,
   otherwise implementations would have problems when processing it.
   There are only two ways to indicate this difference, in the current
   MIME syntax.  One way is to indicate the edition in the charset name,
   such as CN-GB-1988-80 (the 1980's edition of GB 1988).  The other way
   is to define a new optional parameter such as "charset-edition".  The
   latter way is better because receiving applications that can only
   process an older edition can still recognize the character set and
   offer to display the text in the older edition.  This display may
   have a few mistakes, but it is better than refusing to display any
   text at all or defaulting to an inappropriate character set such as
   US-ASCII or ISO-8859-1.

4.2. "charset-extension"

   This parameter is also used after the MIME "charset" parameter.  It
   is case-insensitive and optional, and any value of this parameter
   should be registered in IANA.  Unregistered value should start with
   "x-" as with any MIME extension-token.  Implementations should ignore
   this parameter unless the implementation has specific support for
   that particular character set extension.

   A character set extension has displayed glyphs for code points that
   are not assigned in the character set, for example, vendor-specific
   extensions of standard character sets.  This parameter provides the
   option of using these extensions.  Although character set extensions
   may cause interoperability problems, we recognize the existence of
   such extensions.

   For example:
      Content-Type: text/plain; charset=CN-Big5; charset-edition=1984;
       charset-extension=ETen-2.00.03-DOS

   This may indicate Eten company's extension of Big5: ETen 2.00.03 for
   DOS, assuming that "ETen-2.00.03-DOS" is registered with the IANA..

4.3. Formal Syntax:

The following changes and additions are made to the MIME syntax:

MIME 構文に対して、次の通り変更・追加します。

   charset-edition   := "charset-edition" "=" 4DIGIT
                         ; year of edition in four digits

   charset-extension := "charset-extension" "=" extension-token

[192] charset-edition の値は4桁西暦年と定義されていました。

[193] CN-GB (GB 2312、中華人民共和国向け) や CN-Big5 (Big5、中華民国台湾向け) との併用が想定されていました。中華人民共和国の紀年法は公元ですが、中華民国の紀年法は中華民国です。しかしその違いは配慮されず、西暦年のみが許されていました。

[108] charset-edition の定義は 4DIGIT ですが、10000年問題を考慮して、4*DIGIT とするのがよさげ。2000年問題対策と称して 2DIGIT に1900足したりする必要はないと思います。そういうのを受け取っても、西暦1世紀に制定された charset だということにしてはどーですか?

[109] charset-extension の登録簿は IANA には無いみたい。

古い Netscape Navigator の charset パラメーター認識問題

[110] 古い NetscapeNavigator (2 以前?) には、 charset パラメーターを特定の値 (hard coding で実装されている名前) 以外の場合、手動で符号化方法を指定することさえ出来なくなり、結果 (多くの場合) 文字化けするという問題があります。

[111] 特に日本語系文字コードでは、 x-euc-jp 及び x-sjis という私用名にしか対応しておらず、 NN2 より後に登録された IANA 名 euc-jp 及び shift_jis や windows-31j が指定されていると文字化けします。

[113] なお、 iso-2022-jp については問題は起こりません。

[112] なお、この指定は HTTP 頭内の Content-Type:欄に指定しても無視されるようで、 HTML の meta要素で http-equiv属性を使って指定する必要があります。

[114] この指定とは関係なしに文書の一部が文字化けする現象がありますが、原因はよくわかりません。

[115] 2002-12-01 (日) 10:38 >>110-114 問題を Netscape Navigator 2.01 で確認しますた。 (っていうかこんな古い版はやく捨てましょう。)

IANA charset 登録簿

[195] RFC 2278 - IANA Charset Registration Procedures, 2021-01-24T16:24:33.000Z, 2021-03-23T11:55:30.317Z https://tools.ietf.org/html/rfc2278

[196] RFC 2978 - IANA Charset Registration Procedures, 2021-04-11T14:19:36.000Z, 2021-04-20T09:19:26.369Z https://tools.ietf.org/html/rfc2978

[197] RFC Errata Report » RFC Editor, 2021-04-20T09:19:36.000Z https://www.rfc-editor.org/errata_search.php?rfc=2978

HTTP `Content-Type` `charset`

[218] Configuring WWW Server for ISO 8859-2, 2025-06-16T14:19:26.000Z, 2006-05-14T03:06:41.563Z https://web.archive.org/web/20060514030239/http://nl.ijs.si/gnusl/cee/app/httpd.html

Alis Technologies, Inc. has modified NCSA HTTP server code and added some support for non-ISO 8859-1. The two features added are the transmission of Charset parameter with Content-Type field and the implementation of the Accept-Language protocol. Their approach was to add a SGML-style header including meta-information in the beginning of HTML document.

`<meta http-equiv=Content-Type>`

[219] How to display and input HANGUL - The Korean characters on your Windows, 2021-12-31T02:07:47.000Z, 2025-06-17T09:59:14.903Z http://www.kmml.net/ehc/ehc2.html

MicrosoftのFrontPageというHTML開発環境で作成されたハングルのHTMLには、ブラウザ側で表示言語設定を明示的に変えることなく、自動的(強制的)にその設定を変更させる目的で、
<meta http-equiv="Content-Type" content="text/html; charset=ks_c_5601-1987">
なる、MicrosoftのWWWブラウザでないと認識できないcharset(文字セット)名が記述されます。つまり、このcharset名はもう一方のWWWブラウザの雄、Netscape Navigator(Communicator含む) 4.0xまででは認識できないのです。
日本語Windows上のNetscape Navigator 4.0xまででは、このタグがあるとLatin 1(西欧言語)エンコード状態に強制的に切り替えられてしまい、そのままではハングルは見られなくなります。

[220] Making Home Pages in Korean - EHC, 2021-12-31T02:07:47.000Z, 2025-06-17T10:01:26.745Z http://www.kmml.net/ehc/hhpage.html#_four

Netscape Navigator、Internet Explorerのどちらでも表示言語の強制切り替えが正常に働くHTMLを書きたい場合は、
<meta http-equiv="Content-Type" content="text/html; charset=EUC-KR">
のようにNetscape Navigator/Internet Explorerのいずれでも認識できるタグにするか、または、
<meta http-equiv="Content-Type" content="text/html; charset=EUC-KR; charset=ks_c_5601-1987">
のように両方を併記するタグにしましょう。EUC-KRは必ず先に書いてください。
もちろん、charset=を指定しないのも一案といえます。

`<meta charset>`

[217] ISO-8859-8

メモ

[118] RFC3023 は、 XML 系の媒体型のための charset 引数を規定しています。

[119] >>118 text/xml などは、省略時の既定値が MIME でも HTTP でも us-ascii です。また、 application/xml などは、省略時には既定値なしで、 xml宣言などを参照します。

[120] >>118-119 他の +xml 系媒体型がよく定義にこの RFC を参照していますから、影響力は大きいです。

[122] Bug 1697 - multipart/form-data 送出時に Content-Type が送られない http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=1697, Bug 116346 - Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[from sub] http://bugzilla.mozilla.org/show_bug.cgi?id=116346 : multipart/form-data を送るときに Mozilla が charset 引数をつけなかったという問題。だけどこれは特定の媒体型や特定の実装に固有じゃない、実は大きな問題をはらんでいます。「ファイル添付」のような、 MIME 実体を生成する応用を単に通過するだけのデータのメタ情報をどうやって得るのかとか、多文字化された実装で charset 情報とどう向き合うのかとか。 MIME は10年も前の規格で、こんなことなんて考えてもいなかったわけですけど、この先どうなるでしょう。やっぱりこれまで通り騙し騙し無理しながら現状維持し続けるしか方法はないのかな。

[121] charset=136 ってのみかけたけど、中身は Big5 だった。どうしてこんな値になったんだ?

[123] >>121 はちなみに spam。

[25] Charset parameter of +xml media types from Bjoern Hoehrmann on 2006-09-05 (www-archive@w3.org from September 2006) http://lists.w3.org/Archives/Public/www-archive/2006Sep/0003 (名無しさん 2006-09-07 23:19:07 +00:00)

[30] I'm not a Klingon : Expected names of Microsoft Windows "ANSI" Code Pages (Encodings) http://blogs.msdn.com/shawnste/archive/2006/11/06/expected-names-of-microsoft-windows-ansi-code-pages-encodings.aspx (名無しさん [sage])

[34] Re: Unicode distribution? from Erik van der Poel on 2007-01-05 (www-international@w3.org from January to March 2007) (Erik van der Poel (erikv@google.com) 著, 2007-01-07 23:27:03 +09:00 版) http://lists.w3.org/Archives/Public/www-international/2007JanMar/0004 (名無しさん 2007-01-12 21:09:39 +00:00)

[43] ハーブティー、アロマオイル、アロマランプを種類豊富に販売ユーン卸販売 (2007-06-11 01:15:22 +09:00 版) http://www.yuwn.com/

この鯖は全部設定がおかしいみたい。 charset=0 になってる。 (名無しさん 2007-06-10 16:18:27 +00:00)

[48] openssl-engine-0.9.6a/test/CAss.cnf - Google Code Search (2007-09-19 10:28:09 +09:00 版) http://www.google.com/codesearch?hl=en&q=show:gp2a770O6xs:Oe1g97XTGPk:6T5s7D-EE9I&sa=N&ct=rd&cs_p=http://www.openssl.org/source/openssl-engine-0.9.6a.tar.gz&cs_f=openssl-engine-0.9.6a/test/CAss.cnf&start=1

Content-Type: text/html; charset=UTF-8; charset=UTF-8

(名無しさん)

[49] >>48 User-Agent: か何かによっては後者が ISO-8859-1 になることもある模様。

(名無しさん 2007-09-19 01:33:01 +00:00)

[38] TMX Specification (2007-02-24 17:51:20 +09:00 版) http://www.lisa.org/standards/tmx/tmx.html#base (名無しさん [sage])

[39] TMX Specification (2007-02-24 17:51:20 +09:00 版) http://www.lisa.org/standards/tmx/tmx.html#oencoding (名無しさん [sage])

[42] Kazuho@Cybozu Labs: Re: PoCo::Client::HTTP が勝手に文字コードを変えてしまう件 (2007-04-24 06:52:47 +09:00 版) http://labs.cybozu.co.jp/blog/kazuho/archives/2007/04/poco_patch.php

charset=iso-8859-1 default Breaks the Web. (名無しさん 2007-04-23 21:54:55 +00:00)

[47] hoshikuzu | star_dust の書斎 (2007-08-10 22:40:44 +09:00 版) http://d.hatena.ne.jp/hoshikuzu/20070807#p1 (名無しさん)

[51] feel部屋:皆様に幸と笑いあれ♪ - 轟轟戦隊 (2007-09-24 14:10:56 +09:00 版) http://feel.g.hatena.ne.jp/keyword/%e8%bd%9f%e8%bd%9f%e6%88%a6%e9%9a%8a

<link rel="stylesheet" href="/diary_css/base.css" type="text/css" media="all" charset="euc-jp">
<link rel="stylesheet" href="/theme/hazakura/hazakura.css" type="text/css" media="all" charset="euc-jp">

リンク先の CSS スタイル・シートは、 @charset の指定はあるものの HTTP Content-Type: charset は指定なし。 (名無しさん)

[52] メモ - 葉っぱ日記 (2007-11-28 12:10:50 +09:00 版) http://d.hatena.ne.jp/hasegawayosuke/20071120/p1

[68] Re: Unsupported transport-layer encodings (Alexey Proskuryakov 著, 2008-07-22 20:55:40 +09:00 版) http://lists.w3.org/Archives/Public/public-html/2008Jul/0279.html (名無しさん)

[69] Charset usage data (Philip Taylor 著, 2008-03-06 02:52:53 +09:00 版) http://lists.w3.org/Archives/Public/public-html/2008Mar/0029.html (名無しさん)

[71] HTML5のFPWDとmeta要素のcharset属性 - vantguarde - web:g (2008-10-07 14:36:29 +09:00 版) http://web.g.hatena.ne.jp/vantguarde/20080123/1201089589

[74] TAG Finding: Internet Media Type registration, consistency of use (2002-12-17 22:06:12 +09:00 版) http://www.w3.org/2001/tag/2002/0129-mime#char-encoding

[825] MAMA: Document Encodings - Opera Developer Community (2008-11-25 20:20:22 +09:00 版) http://dev.opera.com/articles/view/mama-document-encodings/

[826] Firefox 3 は script charset が指定されていないとスクリプトの Content-Type charset に従いますが (たぶん ― 自動判別かどうかは未調査)、 WinIE 7 はそうじゃないみたいです (たぶん HTML charcterSet と同じとみなす)。

[828] The WHATWG Blog » Blog Archive » The Road to HTML 5: character encoding (2009-02-21 10:51:17 +09:00 版) http://blog.whatwg.org/the-road-to-html-5-character-encoding

[829] (X)HTML5 Tracking (2009-10-13 22:55:03 +09:00 版) http://html5.org/tools/web-apps-tracker?from=4125&to=4126

[832] RFC 5537 - Netnews Architecture and Protocols (2009-12-29 07:15:42 +09:00 版) http://tools.ietf.org/html/rfc5537

[833] 世界史講義録 (2010-02-28 19:44:19 +09:00 版) http://www.geocities.jp/timeway/

This page is encoded in euc-jp, but its Content-Type is text/html (with no charset parameter). The page includes ...

<meta name="META HTTP-EQUIV="
content="text/html;CHRSET=iso-2022-jp">

... but it is broken in various ways, sigh.

Firefox correctly decodes the page as euc-jp, while Chrome fails to decode the page, by recognizing the content as shift_jis.

[834] HTML5 Revision Tracker (2010-04-14 21:50:48 +09:00 版) http://html5.org/tools/web-apps-tracker?from=5041&to=5042

[835] XProc: An XML Pipeline Language (2010-05-11 22:38:07 +09:00 版) http://www.w3.org/TR/2010/REC-xproc-20100511/#p.data

[836] XProc: An XML Pipeline Language (2010-05-11 22:38:07 +09:00 版) http://www.w3.org/TR/2010/REC-xproc-20100511/#c.response_body

[842] [CITE@@en[Web Applications 1.0 r5295 <meta charset> should only work for ASCII-compatible encodings.Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=10260]] ( (2010-08-17 04:27:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=5294&to=5295

[843] [CITE@@en[Web Applications 1.0 r5414 Allow authors to override WebSRT's encoding using <track charset>.]] ( (2010-09-04 09:14:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=5413&to=5414

[844] IRC logs: freenode / #whatwg / 20101126 ( (2010-12-06 08:58:36 +09:00 版)) http://krijnhoetmer.nl/irc-logs/whatwg/20101126#l-354

[848] RFC 6047 - iCalendar Message-Based Interoperability Protocol (iMIP) ( (2011-03-13 15:32:14 +09:00 版)) http://tools.ietf.org/html/rfc6047#section-2.4

[849] draft-melnikov-mime-default-charset-00 - Update to MIME regarding Charset Parameter Handling in Textual Media Types (2011-06-15 09:11:40 +09:00 版) http://tools.ietf.org/html/draft-melnikov-mime-default-charset-00

[855] Web Applications 1.0 r6641 Mention that application/x-www-form-urlencoded;charset does nothing. ( (2011-10-06 08:08:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=6640&to=6641

[856] RFC 6365 - Terminology Used in Internationalization in the IETF (2011-09-10 19:11:51 +09:00 版) http://tools.ietf.org/html/rfc6365#page-8

[857]

However, it is important to point out that the MIME concept of 'charset' in some cases cuts across several layers of components in our model. While this can be accepted in existing registrations, we also recommend that the MIME registration procedure for character sets be modified to show how a proposed character set deals with the CCS and the CES. Most 'charsets' have a well defined CCS and CES, they should merely be teased apart for the registration.

RFC 2130 - The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996 (2011-09-04 12:12:00 +09:00 版) http://tools.ietf.org/html/rfc2130#page-10

[859] RFC 2277 - IETF Policy on Character Sets and Languages ( (2011-11-20 13:18:05 +09:00 版)) http://tools.ietf.org/html/rfc2277#section-3

[860] IRC logs: freenode / #whatwg / 20111218 ( (2011-12-20 08:31:26 +09:00 版)) http://krijnhoetmer.nl/irc-logs/whatwg/20111218

[869] Web Applications 1.0 r7160 Require an encoding declaration. ( (2012-06-30 05:42:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=7159&to=7160

[870] [whatwg] Encoding sniffing algorithm ( (2012-09-07 04:55:05 +09:00 版)) http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-September/037190.html

[871] Web Applications 1.0 r7324 Attempt to slightly more closely align with reality. ( (2012-09-07 13:02:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=7323&to=7324

[872] Web Applications 1.0 r7360 Make a BOM override HTTP headers. ( (2012-09-16 12:55:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=7359&to=7360

[890] Web Applications 1.0 r7958 New encoding defaults based on more data. ( (2013-06-12 13:49:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=7957&to=7958

[893] IRC logs: freenode / #whatwg / 20131029 ( (2013-10-30 20:25:08 +09:00 版)) http://krijnhoetmer.nl/irc-logs/whatwg/20131029#l-348

[894] IRC logs: freenode / #whatwg / 20131030 ( (2013-11-01 21:16:51 +09:00 版)) http://krijnhoetmer.nl/irc-logs/whatwg/20131030#l-312

[895] Attachment #8336745 for bug #910211 ( (2013-11-23 17:34:23 +09:00 版)) https://bugzilla.mozilla.org/attachment.cgi?id=8336745&action=diff#a/dom/encoding/domainsfallbacks.properties_sec2

[898] Bug 14703 – Integrate style sheet loading with CSSOM ( (2014-01-14 09:00:15 +09:00 版)) https://www.w3.org/Bugs/Public/show_bug.cgi?id=14703

[899] Web Applications 1.0 r8391 Define how <link charset> works. ( (2014-01-14 08:12:00 +09:00 版)) http://html5.org/tools/web-apps-tracker?from=8390&to=8391

[900] Please consider if your locale could default to UTF-8 for outgoing email - Google グループ ( (2014-02-01 11:33:00 +09:00 版)) https://groups.google.com/forum/#!topic/mozilla.dev.l10n/PH7tF9m8vUY

[907] Welcome to Netscape Navigator Version 2.0 ( (2014-04-07 08:51:19 +09:00 版)) http://web.archive.org/web/20030202175634/http://wp.netscape.com/eng/mozilla/2.0/relnotes/windows-2.0.html#Images

[908] Welcome to Netscape Navigator Version 1.1 ( (2014-04-07 08:53:46 +09:00 版)) http://web.archive.org/web/20030203042026/http://wp.netscape.com/eng/mozilla/1.1/relnotes/windows-1.1.html#Images

[910] Character Model for the World Wide Web 1.0: Fundamentals ( (2005-02-15 14:24:00 +09:00 版)) http://www.w3.org/TR/charmod/#sec-EncodingIdent

[911] Bug 25534 – HTML spec should not encourage to auto-detect UTF-8 ( (2014-05-05 08:06:57 +09:00 版)) https://www.w3.org/Bugs/Public/show_bug.cgi?id=25534

[913] Internationalization Tag Set (ITS) Version 2.0 ( (2013-10-27 19:39:43 +09:00 版)) http://www.w3.org/TR/its20/#storagesize

[924] Bug 26655 – Support mistakenly `utf-`-prefixed encodings seen in the wild ( (2014-08-26 06:06:48 +09:00 版)) https://www.w3.org/Bugs/Public/show_bug.cgi?id=26655

[925] RFC 5987 - Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters ( (2014-08-10 01:17:43 +09:00 版)) http://tools.ietf.org/html/rfc5987#section-3.2

[85] OASIS Open Document Format for Office Applications (OpenDocument) Version 1.2 - Part 1: OpenDocument Schema (2011-09-29 13:00:00 +09:00 版) http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#a18_3_35textEncoding

A character encoding in the notation described in the §4.3.3 of [XML1.0], or the value x-symbol. The value is x-symbol means that the character encoding is not enumerated by §4.3.3 of [XML1.0].

[86] OASIS Open Document Format for Office Applications (OpenDocument) Version 1.2 - Part 1: OpenDocument Schema (2011-09-29 13:00:00 +09:00 版) http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#a19_479style_font-charset

The value of this attributes can be x-symbol or a character encoding in the notation described in the §4.3.3 of [XML1.0]. If the value is x-symbol, the font does not define glyphs according to the semantics of [UNICODE].

[87] OASIS Open Document Format for Office Applications (OpenDocument) Version 1.2 - Part 1: OpenDocument Schema (2011-09-29 13:00:00 +09:00 版) http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#a20_260style_font-charset

The value of this attributes can be x-symbol or a character encoding in the notation described in the §4.3.3 of [XML1.0]. If the value is x-symbol, the font does not define glyphs according to the semantics of [UNICODE]. If the value is one of the encodings or transformations of [UNICODE], the font does define glyphs according to the semantics of [UNICODE]. The use of other values is deprecated.

[94] (1997-12-25 04:10:04 +09:00 版) http://lynx.isc.org/current/CHANGES2.6

06-07-96
* Moved the Kanji handling variables into HText structure elements to make
the GridText.c functions reentrant for them, and added code for regulating
them via charset parameters in server headers or META tags. The recognized
parameters are EUC-JP, Shift-JIS, ISO-2022-JP, ISO-2022-JP-2, and EUC-KR.
E.g., a META with:
HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Shift-JIS"
will set up handling of the document as Shift-JIS. - FM

[95] ハングル工房綾瀬 - internet上のハングル (2008-04-29 17:49:31 +09:00 版) http://www.han-lab.gr.jp/~mizuno/com/web.html

[96] Welcome to Netscape Navigator Version 2.0 (2015-01-24 23:10:57 +09:00 版) http://web.archive.org/web/20030202175634/http://wp.netscape.com/eng/mozilla/2.0/relnotes/windows-2.0.html

MIME charsets supported include:
"us-ascii", "iso-8859-1", "x-mac-roman", "iso-8859-2", "x-mac-ce"
"iso-2022-jp","x-sjis", "x-euc-jp",
"euc-kr", "iso-2022-kr",
"gb2312", "gb_2312-80"
"x-euc-tw", "x-cns11643-1", "x-cns11643-2", "big5"

[97] Welcome to Netscape Navigator Version 2.0 (2015-01-24 23:12:25 +09:00 版) http://web.archive.org/web/20030202175634/http://wp.netscape.com/eng/mozilla/2.0/relnotes/windows-2.0.html

The IETF draft on Internationalization of the Hypertext Markup Language proposes an extension to the HTML META tag to allow MIME charset information to be contained in the HTML document:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-2022-JP">
This support is in Navigator 2.0 and the MIME charset values supported includes all the charsets mentioned above in the section on "Additional Languages and MIME Charsets" Supported.

[98] XHTML™ 2.0 - XHTML Embedding Attributes Module (2010-12-17 00:44:37 +09:00 版) http://www.w3.org/TR/2010/NOTE-xhtml2-20101216/mod-embedding.html

[99] Authoring Techniques for XHTML & HTML Internationalization: Characters and Encodings 1.0 ( (2015-01-24 02:19:22 +09:00 版)) http://www.w3.org/TR/2015/NOTE-i18n-html-tech-char-20150127/

[156] Only replace charset parameter values that do not case-insensitively … · whatwg/xhr@a1f8e14 (2015-09-22 11:22:25 +09:00 版) https://github.com/whatwg/xhr/commit/a1f8e140fef9e3ee8255ef58a2c71ff9d75933d2

[158] 各ブラウザが解釈可能なcharsetのリスト (2015-06-07 16:46:19 +09:00 版) http://l0.cm/encodings/list/

[159] Add <script type="module"> and module resolution/fetching/evaluation · whatwg/html@cd1a9fb (2016-01-21 22:16:02 +09:00 版) https://github.com/whatwg/html/commit/cd1a9fb1e83f7d0bc30be8b34ecdaf444a0b19a4

[161] Update integration with Encoding Standard · whatwg/html@6a31c26 (2016-02-14 18:46:21 +09:00 版) https://github.com/whatwg/html/commit/6a31c26cf12e39dab1a488e75dd56c03d6786d39

[174] oauth.access method | Slack (Slack著, 2016-08-27 16:26:44 +09:00) https://api.slack.com/methods/oauth.access

superfluous_charset
The method was called via a POST request, and the specified Content-Type is not defined to understand the charset parameter. However, charset was in fact present. Specifically, form-data content types (e.g. multipart/form-data) are the ones for which charset is superfluous.

[177] 18338 – Registries (IANA): text/html MIME type definition should require that charset="" value be valid and correct (2017-07-23 16:34:33 +09:00) https://www.w3.org/Bugs/Public/show_bug.cgi?id=18338

[178] [c] (0) Outlaw <meta http-equiv=content-type content=text/html… (Hixie著, 2008-05-23 07:18:45 +09:00) https://github.com/whatwg/html/commit/ea48293c495d6c863ff4813cac17f37d32fdaf63

[179] [ct] (0) Require that if a <meta> charset is included, the encoding b… (Hixie著, 2008-05-24 19:20:43 +09:00) https://github.com/whatwg/html/commit/ac3cdabc9466b5530e5d7f21f4586c95e19c3b5e

[181] Improve <style> and <script> processing and conformance (domenic著, 2017-09-14 18:42:48 +09:00) https://github.com/whatwg/html/commit/9c612ac8641b5174849a2d3cb924fe662a8d3a09

[182] Require UTF-8 (sideshowbarker著, 2017-10-06 19:09:17 +09:00) https://github.com/whatwg/html/commit/fae77e3c558b9f083dfb9086752863a4789268f5

[185] Require utf-8 when specifying character encoding by sideshowbarker · Pull Request #3091 · whatwg/html (2017-11-03 19:52:26 +09:00) https://github.com/whatwg/html/pull/3091

[186] Invitation to attend upcoming IETF meetings · Issue #58 · whatwg/meta (2018-03-20 17:39:19 +09:00) https://github.com/whatwg/meta/issues/58

[187] Define Content-Type manipulation in terms of MIME Sniffing (annevk著, 2018-04-16 18:51:34 +09:00) https://github.com/whatwg/xhr/commit/edc6f8f16f58d201afb49e40ca166b8bc1ae74a3

[188] Fix send() charset overriding to use new MIME type infrastructure · Issue #188 · whatwg/xhr (2018-04-17 23:08:41 +09:00) https://github.com/whatwg/xhr/issues/188

[189] Define Content-Type manipulation in terms of MIME Sniffing by annevk · Pull Request #176 · whatwg/xhr (2018-04-17 23:09:29 +09:00) https://github.com/whatwg/xhr/pull/176

[190] Fix overrideMimeType() again (annevk著, 2018-04-17 17:54:28 +09:00) https://github.com/whatwg/xhr/commit/121cee50b6f51215f046266642964b4c53a02a7c

[191] Look at overrideMimeType() again · Issue #157 · whatwg/xhr (2018-04-18 13:57:08 +09:00) https://github.com/whatwg/xhr/issues/157

[203] rfc3808, 2021-06-11T05:01:42.000Z https://datatracker.ietf.org/doc/html/rfc3808

[204] rfc3659 (2021-07-16T03:25:36.000Z) https://datatracker.ietf.org/doc/html/rfc3659#section-7.5

[221] Počešťování Lynxu pro DOS, 2025-10-26T07:11:47.000Z, 1999-02-03T16:01:50.085Z https://web.archive.org/web/19990203160054/http://www.cestina.cz/cestina/pocestovani/dos/WWW/lynx.html

    Kódová stránka                     MIME jméno

    ISO Latin 1                        iso-8859-1
    ISO Latin 2                        iso-8859-2
    Other ISO Latin                    x-iso-8859-other
    WinLatin1 (cp1252)                 iso-8859-1-windows-3.1-latin-1
    DEC Multinational                  dec-mcs
    Macintosh (8 bit)                  macintosh
    NeXT character set                 x-next
    KOI8-R Cyrillic                    koi8-r
    Chinese                            euc-cn
    Japanese (EUC)                     euc-jp
    Japanese (SJIS)                    shift_jis
    Korean                             euc-kr
    Taipei (Big5)                      big5
    Vietnamese (VISCII)                viscii
    7 bit approximations               us-ascii
    Transparent                        x-transparent
    IBM PC character set               cp437
    IBM PC codepage 850                cp850
    PC Latin2 CP 852                   cp852
    DosCyrillic (cp866)                cp866
    DosGreek (cp737)                   cp737
    DosGreek2 (cp869)                  cp869
    DosArabic (cp864)                  cp864
    DosHebrew (cp862)                  cp862
    WinLatin2 (cp1250)                 windows-1250
    WinCyrillic (cp1251)               windows-1251
    WinGreek (cp1253)                  windows-1253
    WinHebrew (cp1255)                 windows-1255
    WinArabic (cp1256)                 windows-1256
    ISO Latin 3                        iso-8859-3
    ISO Latin 4                        iso-8859-4
    ISO 8859-5 Cyrillic                iso-8859-5
    ISO 8859-6 Arabic                  iso-8859-6
    ISO 8859-7 Greek                   iso-8859-7
    ISO 8859-8 Hebrew                  iso-8859-8
    ISO 8859-9 (Latin 5)               iso-8859-9
    ISO 8859-10                        iso-8859-10
    UNICODE UTF 8                      unicode-1-1-utf-8
    RFC 1345 w/o Intro                 mnemonic+ascii+0
    RFC 1345 Mnemonic                  mnemonic

[222] Multipurpose Internet Mail Extensions, 2025-10-26T07:20:03.000Z, 1999-01-17T09:40:05.571Z https://web.archive.org/web/19990117093828/http://www.cestina.cz/cestina/MIME.html#charset

charsetパラメータ

仕様書

呼称

定義

構文

quoted-string

文脈

ほわっといず charset?

MIME charset

文字集合 Character Set (RFC 2045 2.2)

IANA charset

charset 値の一覧

text/* の charset 引数

仕様書

意味

構文

文脈

既定値

処理

歴史

XML の charset 引数

文字集合

HTTP における charset 引数

RFC 2616 3.4.1 Missing Charset

CGI における charset 引数

CGI 要求における既定値

CGI 応答における既定値

EBCDIC/POSIX

meta 要素 charset 属性 (HTML)

属性値

文脈

歴史

@charset 規則 (CSS)

異体説明における charset 属性

構文

charset (SDP)