<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><section><h1>Introduction</h1><p>This document describes how to validate files with respect to its <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding</anchor>
in an implementation that decodes a file for the purpose of conformance checking.</p><example xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:"><p xmlns="http://www.w3.org/1999/xhtml">An example of such implementation is perl-web-encodings
<anchor-external xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:" a0:resScheme="URI" xmlns:a0="urn:x-suika-fam-cx:markup:suikawiki:0:9:" a0:resParameter="https://github.com/manakai/perl-web-encodings">https://github.com/manakai/perl-web-encodings</anchor-external>.</p></example></section><section><h1>Terminology</h1><p>This document depends on the <cite>Infra Standard</cite>.</p><p>The terms
<dfn>byte</dfn>,
<dfn>byte string</dfn>,
<dfn>code point</dfn>, and
<dfn>string</dfn>
are defined by the <cite>Infra Standard</cite>.</p><p>The terms
<dfn>encoding</dfn>,
<dfn>encode</dfn>,
<dfn>decode</dfn>,
<dfn>decoder</dfn>,
<dfn><code>fatal</code></dfn>,
<dfn>error</dfn>,
<dfn>GB18030</dfn>,
<dfn>GBK</dfn>,
<dfn>Big5</dfn>,
<dfn>Shift_JIS</dfn>,
<dfn>EUC-JP</dfn>,
<dfn>ISO-2022-JP</dfn>,
<dfn>ISO-2022-JP decoder output state</dfn>,
<dfn>ASCII</dfn>,
and 
<dfn>EUC-KR</dfn>
are defined by the <cite>Encoding Standard</cite>.</p><p>The term <dfn>BOM</dfn> is defined by the <cite>Unicode Standard</cite>.</p><p>The term <dfn><var>encoding</var> string</dfn>, where <var>encoding</var> is an <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding</anchor>,
represents a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">byte string</anchor> which is intended to be <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor>
by <var>encoding</var>'s <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoder</anchor>.</p><p><dfn>Bytes for <var>code point</var> in <var>encoding</var></dfn>,
where <var>code point</var> is a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> and <var>encoding</var> is an <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding</anchor>,
are the result of <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">encode</title></anchor> <var>code point</var> in <var>encoding</var>.</p><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> <var>char</var> is <dfn>encodable</dfn> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding</anchor> <var>encoding</var>
if <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">encode</title></anchor> a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">string</anchor> <var>char</var> in <var>encoding</var> with
<var>error mode</var> <code>fatal</code> would not result in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">error</anchor>.</p><p>An <var>encoding</var> string <var>string</var> is <dfn>in error</dfn> if
<anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoding<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> <var>string</var> with <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encoding</anchor> <var>encoding</var> and
<var>error mode</var> <code>fatal</code> would result in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">error</anchor>.</p></section><section><h1>General rules</h1><p>An <var>encoding</var> string <MUST xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">MUST NOT</MUST> be <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">in error</anchor>.</p><p><anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">BOM</anchor> <SHOULD xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">SHOULD NOT</SHOULD> be used.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">It is not roundtrippable and it makes any encoding metadata ignored.</comment-p></section><section><h1>GB18030</h1><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">GB18030</anchor> or <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">GBK</anchor> string is discouraged to contain <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">bytes<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">byte</title></anchor> which
is equal to 0x80 or 0xA3 0xA0 and
is <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> to <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> U+20AC or U+3000 in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">GB18030</anchor> or <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">GBK</anchor>.</p></section><section><h1>Big5</h1><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor> string is discouraged to contain bytes for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>
in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor> which is not <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encodable</anchor> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, use of HKSCS extensions are discouraged, as they are not
roundtrippable.</comment-p><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor> string is discouraged to contain <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">bytes<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">byte</title></anchor> <var>bytes</var> which
is <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> to <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor>
if bytes for <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Big5</anchor> is not equal to <var>bytes</var>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, when there are multiple byte representations for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>,
non-canonical representations are discouraged, as they are not roundtrippable.</comment-p></section><section><h1>Shift_JIS</h1><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor> string is discouraged to contain bytes for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>
in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor> which is not <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encodable</anchor> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, use of EUDCs are discouraged, as they are not
roundtrippable and in fact not interoperable at all.</comment-p><p>A <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor> string is discouraged to contain <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">bytes<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">byte</title></anchor> <var>bytes</var> which
is <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> to <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor>
if bytes for <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">Shift_JIS</anchor> is not equal to <var>bytes</var>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, when there are multiple byte representations for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>,
non-canonical representations are discouraged, as they are not roundtrippable.</comment-p></section><section><h1>EUC-JP</h1><p>An <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor> string is discouraged to contain bytes for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>
in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor> which is not <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">encodable</anchor> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, use of JIS X 0212 characters are discouraged, as they are not
roundtrippable.</comment-p><p>An <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor> string is discouraged to contain <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">bytes<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">byte</title></anchor> <var>bytes</var> which
is <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> to <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor>
if bytes for <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">EUC-JP</anchor> is not equal to <var>bytes</var>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, when there are multiple byte representations for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>,
non-canonical representations are discouraged, as they are not roundtrippable.</comment-p></section><section><h1>ISO-2022-JP</h1><p>An <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor> string is discouraged to contain <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">bytes<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">byte</title></anchor> <var>bytes</var> which
is <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> to <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor> <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor>
if bytes for <var>char</var> in <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor> is not equal to <var>bytes</var>.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">In other words, when there are multiple byte representations for a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">code point</anchor>,
non-canonical representations are discouraged, as they are not roundtrippable.</comment-p><p>An <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor> string <MUST xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">MUST NOT</MUST> be a <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">byte string</anchor> 
the final value of <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP decoder output state</anchor> is not <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ASCII</anchor>
when <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">decoded<title xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">decode</title></anchor> as <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor>.</p><p>An <anchor xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">ISO-2022-JP</anchor> string <MUST xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:9:">MUST NOT</MUST> contain bytes 0x1B 0x24 0x40.</p><comment-p xmlns="urn:x-suika-fam-cx:markup:suikawiki:0:10:">It designates an obsolete standard.</comment-p></section></body></html>