RFC 3987//1

本項は歴史的事項を説明しています。本項の内容の一部または全部は、現在の状況とは異なるかもしれません。

(なお本項の内容の一部または全部は、互換性または歴史的連続性のために現在も有効な場合もあります。しかし新たに利用することは避けるべきです。)

1. Introduction#✎

1.1. Overview and Motivation#✎

A Uniform Resource Identifier (URI) is defined in [RFC3986] as a sequence of characters chosen from a limited subset of the repertoire of US-ASCII [ASCII] characters.

統一資源識別子 (URI) は RFC 3986 において US-ASCII 文字のレパートリの限られた部分集合から選ばれた文字の列として定義されています。

The characters in URIs are frequently used for representing words of natural languages. This usage has many advantages: Such URIs are easier to memorize, easier to interpret, easier to transcribe, easier to create, and easier to guess. For most languages other than English, however, the natural script uses characters other than A - Z. For many people, handling Latin characters is as difficult as handling the characters of other scripts is for those who use only the Latin alphabet. Many languages with non-Latin scripts are transcribed with Latin letters. These transcriptions are now often used in URIs, but they introduce additional ambiguities.

URI の文字はよく自然言語の語を表現するために使われます。この用法は、覚えやすい、分かりやすい、転写しやすい、作りやすい、推定しやすいなど多くの利点があります。しかし英語以外のほとんどの言語では、自然な表記では A〜Z 以外の文字を使います。多くの人々にとってラテン文字を扱うのはラテン字母だけを使う人が他の種類の文字を扱うのと同じくらい難しいことです。多くの非ラテン文字を使う言語はラテン文字に転写されます。この転写はいま URI でよく使われていますが、より曖昧さを増させています。

The infrastructure for the appropriate handling of characters from local scripts is now widely deployed in local versions of operating system and application software. Software that can handle a wide variety of scripts and languages at the same time is increasingly common. Also, increasing numbers of protocols and formats can carry a wide range of characters.

局所用字系の文字を適切に扱う基盤がいま局所版オペレーティング・システムや応用ソフトウェアで広く採用されています。広範囲の用字系や言語を同時に扱えるソフトウェアも益々増えています。また、広範囲の文字を伝播できるプロトコルや書式も増えています。

This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines "internationalized" versions corresponding to other constructs from [RFC3986], such as URI references. The syntax of IRIs is defined in section 2, and the relationship between IRIs and URIs in section 3.

この文書は URI の構文をより広い文字レパートリに拡張した国際化資源識別子 (IRI) という新しいプロトコル要素を定義します。また、 RFC 3986 の URI参照などの他の構造に対応する国際化版も定義します。 IRI の構文は2章で定義しており、 IRI と URI の関係は3章で定義しています。

Using characters outside of A - Z in IRIs brings some difficulties. Section 4 discusses the special case of bidirectional IRIs, section 5 various forms of equivalence between IRIs, and section 6 the use of IRIs in different situations. Section 7 gives additional informative guidelines, and section 8 security considerations.

IRI で A〜Z 以外の文字を使うことによって困ったことも起きます。 4章は双方向的 IRI の特殊な場合を議論し、 5章は IRI の色々な等価形を議論し、 6章は色々な場面での IRI の使用について扱います。 7章は更に参考となる指針を示し、 8章は安全性について考えます。

1.2. Applicability#✎

IRIs are designed to be compatible with recommendations for new URI schemes [RFC2718]. The compatibility is provided by specifying a well-defined and deterministic mapping from the IRI character sequence to the functionally equivalent URI character sequence. Practical use of IRIs (or IRI references) in place of URIs (or URI references) depends on the following conditions being met:

IRI は新しい URI scheme に関する推奨と互換になるよう設計されています。この互換性は IRI 文字列から機能的に等価な URI 文字列への良く定義された決定的な写像を規定することによって提供されます。実際に IRI (や IRI参照) を URI (や URI参照) の代わりに使うかどうかは次の条件が満たされるかどうかによります。

a. A protocol or format element should be explicitly designated to be able to carry IRIs. The intent is not to introduce IRIs into contexts that are not defined to accept them. For example, XML schema [XMLSchema] has an explicit type "anyURI" that includes IRIs and IRI references. Therefore, IRIs and IRI references can be in attributes and elements of type "anyURI". On the other hand, in the HTTP protocol [RFC2616], the Request URI is defined as a URI, which means that direct use of IRIs is not allowed in HTTP requests.

プロトコルや書式は明示的に IRI を伝播できると述べるべきです。 IRI を認めると定義されていない場所に IRI を入れないということです。例えば、 XML Schema は anyURI という IRI と IRI参照を含む型を特に持っています。ですから、型が anyURI の属性や要素では IRI と IRI参照を使うことができます。しかし、 HTTP では Request-URI は URI として定義されており、 IRI は HTTP 要求で認められないことを意味します。

b. The protocol or format carrying the IRIs should have a mechanism to represent the wide range of characters used in IRIs, either natively or by some protocol- or format-specific escaping mechanism (for example, numeric character references in [XML1]).

IRI を伝播するプロトコルや書式は IRI で使われる広範囲の文字を生で直接的に、または何らかのプロトコルや書式が規定する逃避の仕組み (例えば XML の数値文字参照) で表現する仕組みを持つべきです。

c. The URI corresponding to the IRI in question has to encode original characters into octets using UTF-8. For new URI schemes, this is recommended in [RFC2718]. It can apply to a whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384], or the URN syntax [RFC2141]). It can apply to a specific part of a URI, such as the fragment identifier (e.g., [XPointer]). It can apply to a specific URI or part(s) thereof. For details, please see section 6.4.

当該 IRI に対応する URI が元の文字を UTF-8 を使ってオクテットに符号化していなければなりません。新しい URI scheme ではこれが RFC 2718 で推奨されています。これは scheme 全体に適用できます (例えば IMAP URL や POP URL や URN 構文)。 URI の特定の部分のみ、例えば素片識別子のみに適用することもできます (例えば XPointer)。特定の URI やその部分に適用することもできます。詳細は6.4節をご覧下さい。

1.3. Definitions#✎

The following definitions are used in this document; they follow the terms in [RFC2130], [RFC2277], and [ISO10646].

次の定義をこの文書で使います。 RFC 2130, RFC 2277, ISO/IEC 10646 の用語に従っています。

character
A member of a set of elements used for the organization, control, or representation of data. For example, "LATIN CAPITAL LETTER A" names a character.

文字: データの組織化、制御、表現に使う要素の集合の一員。例えば LATIN CAPITAL LETTER A は文字の名前です。

octet
An ordered sequence of eight bits considered as a unit.

オクテット: 単位と考える8ビットの順序列。

character repertoire
A set of characters (in the mathematical sense).

文字レパートリ: 文字の集合 (数学的意味で)。

sequence of characters
A sequence of characters (one after another).

文字列: 文字の列 (ある文字の後に次の文字)。

sequence of octets
A sequence of octets (one after another).

オクテット列: オクテットの列 (あるオクテットの後に次のオクテット)。

character encoding
A method of representing a sequence of characters as a sequence of octets (maybe with variants). Also, a method of (unambiguously) converting a sequence of octets into a sequence of characters.

文字符号化: 文字列をオクテット列として表現する方法 (複数たり得ます)。また、オクテット列を文字列に (曖昧なく) 変換する方法。

charset
The name of a parameter or attribute used to identify a character encoding.

charset: 文字符号化を識別するために使う引数や属性の名前。

UCS
Universal Character Set. The coded character set defined by ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV4].

普遍文字集合。 ISO/IEC 10646 と Unicode規格で定義された符号化文字集合。

IRI reference
Denotes the common usage of an Internationalized Resource Identifier. An IRI reference may be absolute or relative. However, the "IRI" that results from such a reference only includes absolute IRIs; any relative IRI references are resolved to their absolute form. Note that in [RFC2396] URIs did not include fragment identifiers, but in [RFC3986] fragment identifiers are part of URIs.

IRI参照: 国際化資源識別子の共通な用法を指します。 IRI参照は絶対でも相対でも構いません。しかし、 IRI参照から得られる IRI は必ず絶対IRI です。相対IRI参照は絶対形に解決されます。 RFC 2396 で URI は素片識別子を含みませんでしたが、 RFC 3986 では素片識別子も URI の一部であることに注意して下さい。

running text
Human text (paragraphs, sentences, phrases) with syntax according to orthographic conventions of a natural language, as opposed to syntax defined for ease of processing by machines (e.g., markup, programming languages).

普通の文章: 機械が処理しやすいように定義された構文 (例えばマーク付け言語やプログラム言語) に対して、自然言語の辞書的規則に従った構文によって書かれた人間の文章 (段落、文、語句)。

protocol element
Any portion of a message that affects processing of that message by the protocol in question.

プロトコル要素: 当該プロトコルのメッセージの処理に影響を与えるメッセージの部分。

presentation element
A presentation form corresponding to a protocol element; for example, using a wider range of characters.

表現要素: プロトコル要素に対応する表現形。例えば、広範囲の文字を使用します。

create (a URI or IRI)
With respect to URIs and IRIs, the term is used for the initial creation. This may be the initial creation of a resource with a certain identifier, or the initial exposition of a resource under a particular identifier.

(URI や IRI の) 作成: URI や IRI に関しては、最初の作成の時にこの用語を使います。これはある識別子の資源の最初の作成かもしれませんし、特定の識別子に資源を最初に結び付けることかもしれません。

generate (a URI or IRI)
With respect to URIs and IRIs, the term is used when the IRI is generated by derivation from other information.

(URI や IRI の) 生成: URI や IRI に関しては、 IRI が他の情報から派生して生成される時にこの用語を使います。

1.4. Notation#✎

RFCs and Internet Drafts currently do not allow any characters outside the US-ASCII repertoire. Therefore, this document uses various special notations to denote such characters in examples.

RFC と Internet Draft は現在 US-ASCII レパートリの範囲外の文字を認めていません。従って、この文書は例示で US-ASCII の範囲外の文字を示すために特別な表記法を使います。

In text, characters outside US-ASCII are sometimes referenced by using a prefix of 'U+', followed by four to six hexadecimal digits.

文章中で US-ASCII の範囲外の文字を接頭辞 U+ に続けて4〜6桁の十六進数字を使って参照することがあります。

To represent characters outside US-ASCII in examples, this document uses two notations: 'XML Notation' and 'Bidi Notation'.

例示中で US-ASCII の範囲外の文字を表現するためにこの文書では XML 表記法と Bidi 表記法の2つの表記法を使います。

XML Notation uses a leading '&#x', a trailing ';', and the hexadecimal number of the character in the UCS in between. For example, я stands for CYRILLIC CAPITAL LETTER YA. In this notation, an actual '&' is denoted by '&'.

XML 表記法は最初に &#x, 最後に ; を付け、その間に UCS における文字の十六進番号を挟みます。例えば я は CYRILLIC CAPITAL LETTER YA を表します。この表記法では実際の & は & と示します。

Bidi Notation is used for bidirectional examples: Lowercase letters stand for Latin letters or other letters that are written left to right, whereas uppercase letters represent Arabic or Hebrew letters that are written right to left.

Bidi 表記法は双方向的な例示で使います。小文字はラテン文字やその他の左から右に書く文字を表し、大文字はアラビア文字やヘブライ文字のように右から左に書く文字を表します。

To denote actual octets in examples (as opposed to percent-encoded octets), the two hex digits denoting the octet are enclosed in "<" and ">". For example, the octet often denoted as 0xc9 is denoted here as <c9>.

例示中で (百分率符号化したオクテットに対して) 実際のオクテットを示すため2桁のオクテットを示す数字を < と > で囲って示します。例えば、よく 0xc9 と書かれるオクテットはここでは <c9> と示します。

In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119].

この文書では、鍵語 しなければなりません, してはなりません, 必須, するべきです, するべきではありません, 推奨します, して構いません, 任意選択 は RFC 2119 で説明されているように解釈します。

License#✎

RFCのライセンス

RFC 3987//1

RFC 3987//1

目次

1. Introduction#✎

1.1. Overview and Motivation#✎

1.2. Applicability#✎

1.3. Definitions#✎

1.4. Notation#✎

License#✎

メモ#✎