A Uniform Resource Identifier (URI) is defined in [RFC3986] as a sequence of characters chosen from a limited subset of the repertoire of US-ASCII [ASCII] characters.
統一資源識別子 (URI) は RFC 3986 において US-ASCII
The characters in URIs are frequently used for representing words of natural languages. This usage has many advantages: Such URIs are easier to memorize, easier to interpret, easier to transcribe, easier to create, and easier to guess. For most languages other than English, however, the natural script uses characters other than A - Z. For many people, handling Latin characters is as difficult as handling the characters of other scripts is for those who use only the Latin alphabet. Many languages with non-Latin scripts are transcribed with Latin letters. These transcriptions are now often used in URIs, but they introduce additional ambiguities.
URI の文字はよく自然言語の語を表現するために使われます。
自然な表記では A
この転写はいま URI でよく使われていますが、
The infrastructure for the appropriate handling of characters from local scripts is now widely deployed in local versions of operating system and application software. Software that can handle a wide variety of scripts and languages at the same time is increasingly common. Also, increasing numbers of protocols and formats can carry a wide range of characters.
局所用字系の文字を適切に扱う基盤がいま局所版オペレーティング・システムや応用ソフトウェアで広く採用されています。 広範囲の用字系や言語を同時に扱えるソフトウェアも益々増えています。 また、広範囲の文字を伝播できるプロトコルや書式も増えています。
This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines "internationalized" versions corresponding to other constructs from [RFC3986], such as URI references. The syntax of IRIs is defined in section 2, and the relationship between IRIs and URIs in section 3.
この文書は URI の構文をより広い文字レパートリに拡張した国際化資源識別子
(IRI) という新しいプロトコル要素を定義します。また、
RFC 3986 の URI参照などの他の構造に対応する国際化
IRI の構文は2章で定義しており、 IRI と URI
Using characters outside of A - Z in IRIs brings some difficulties. Section 4 discusses the special case of bidirectional IRIs, section 5 various forms of equivalence between IRIs, and section 6 the use of IRIs in different situations. Section 7 gives additional informative guidelines, and section 8 security considerations.
4章は双方向的 IRI の特殊な場合を議論し、
5章は IRI の色々な等価形を議論し、
6章は色々な場面での IRI の使用について扱います。
IRIs are designed to be compatible with recommendations for new URI schemes [RFC2718]. The compatibility is provided by specifying a well-defined and deterministic mapping from the IRI character sequence to the functionally equivalent URI character sequence. Practical use of IRIs (or IRI references) in place of URIs (or URI references) depends on the following conditions being met:
IRI は新しい URI scheme に関する推奨と互換になるよう設計されています。 この互換性は IRI 文字列から機能的に等価な URI 文字列への良く定義された決定的な写像を規定することによって提供されます。 実際に IRI (や IRI参照) を URI (や URI参照) の代わりに使うかどうかは次の条件が満たされるかどうかによります。
a. A protocol or format element should be explicitly designated to be able to carry IRIs. The intent is not to introduce IRIs into contexts that are not defined to accept them. For example, XML schema [XMLSchema] has an explicit type "anyURI" that includes IRIs and IRI references. Therefore, IRIs and IRI references can be in attributes and elements of type "anyURI". On the other hand, in the HTTP protocol [RFC2616], the Request URI is defined as a URI, which means that direct use of IRIs is not allowed in HTTP requests.
プロトコルや書式は明示的に IRI を伝播できると述べるべきです。
IRI を認めると定義されていない場所に IRI を入れないということです。
例えば、 XML Schema は anyURI
という IRI と IRI参照を含む型を特に持っています。
ですから、型が anyURI
IRI と IRI参照を使うことができます。しかし、 HTTP
では Request-URI
は URI として定義されており、
IRI は HTTP 要求で認められないことを意味します。
b. The protocol or format carrying the IRIs should have a mechanism to represent the wide range of characters used in IRIs, either natively or by some protocol- or format-specific escaping mechanism (for example, numeric character references in [XML1]).
IRI を伝播するプロトコルや書式は IRI で使われる広範囲の文字を生で直接的に、 または何らかのプロトコルや書式が規定する逃避の仕組み (例えば XML の数値文字参照) で表現する仕組みを持つべきです。
c. The URI corresponding to the IRI in question has to encode original characters into octets using UTF-8. For new URI schemes, this is recommended in [RFC2718]. It can apply to a whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384], or the URN syntax [RFC2141]). It can apply to a specific part of a URI, such as the fragment identifier (e.g., [XPointer]). It can apply to a specific URI or part(s) thereof. For details, please see section 6.4.
当該 IRI に対応する URI が元の文字を UTF-8 を使ってオクテットに符号化していなければなりません。 新しい URI scheme ではこれが RFC 2718 で推奨されています。 これは scheme 全体に適用できます (例えば IMAP URL や POP URL や URN 構文)。 URI の特定の部分のみ、 例えば素片識別子のみに適用することもできます (例えば XPointer)。特定の URI やその部分に適用することもできます。 詳細は6.4節をご覧下さい。
The following definitions are used in this document; they follow the terms in [RFC2130], [RFC2277], and [ISO10646].
次の定義をこの文書で使います。 RFC 2130, RFC 2277, ISO/IEC 10646 の用語に従っています。
- character
- A member of a set of elements used for the organization, control, or representation of data. For example, "LATIN CAPITAL LETTER A" names a character.
- octet
- An ordered sequence of eight bits considered as a unit.
- character repertoire
- A set of characters (in the mathematical sense).
- sequence of characters
- A sequence of characters (one after another).
- sequence of octets
- A sequence of octets (one after another).
- character encoding
- A method of representing a sequence of characters as a sequence of octets (maybe with variants). Also, a method of (unambiguously) converting a sequence of octets into a sequence of characters.
- charset
- The name of a parameter or attribute used to identify a character encoding.
- Universal Character Set. The coded character set defined by ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV4].
普遍文字集合。 ISO/IEC 10646 と Unicode規格で定義された符号化文字集合。
- IRI reference
- Denotes the common usage of an Internationalized Resource Identifier. An IRI reference may be absolute or relative. However, the "IRI" that results from such a reference only includes absolute IRIs; any relative IRI references are resolved to their absolute form. Note that in [RFC2396] URIs did not include fragment identifiers, but in [RFC3986] fragment identifiers are part of URIs.
IRIは必ず絶対IRI です。相対IRI参照は絶対形に解決されます。 RFC 2396 で URI は素片識別子を含みませんでしたが、 RFC 3986 では素片識別子も URI の一部であることに注意して下さい。
- running text
- Human text (paragraphs, sentences, phrases) with syntax according to orthographic conventions of a natural language, as opposed to syntax defined for ease of processing by machines (e.g., markup, programming languages).
- protocol element
- Any portion of a message that affects processing of that message by the protocol in question.
- presentation element
- A presentation form corresponding to a protocol element; for example, using a wider range of characters.
- create (a URI or IRI)
- With respect to URIs and IRIs, the term is used for the initial creation. This may be the initial creation of a resource with a certain identifier, or the initial exposition of a resource under a particular identifier.
- generate (a URI or IRI)
- With respect to URIs and IRIs, the term is used when the IRI is generated by derivation from other information.
RFCs and Internet Drafts currently do not allow any characters outside the US-ASCII repertoire. Therefore, this document uses various special notations to denote such characters in examples.
RFC と Internet Draft は現在 US-ASCII
この文書は例示で US-ASCII
In text, characters outside US-ASCII are sometimes referenced by using a prefix of 'U+', followed by four to six hexadecimal digits.
To represent characters outside US-ASCII in examples, this document uses two notations: 'XML Notation' and 'Bidi Notation'.
XML 表記法
と Bidi 表記法
XML Notation uses a leading '&#x', a trailing ';', and the hexadecimal number of the character in the UCS in between. For example, я stands for CYRILLIC CAPITAL LETTER YA. In this notation, an actual '&' is denoted by '&'.
XML 表記法は最初に &#x
, 最後に ;
を付け、その間に UCS における文字の十六進番号を挟みます。
を表します。この表記法では実際の &
Bidi Notation is used for bidirectional examples: Lowercase letters stand for Latin letters or other letters that are written left to right, whereas uppercase letters represent Arabic or Hebrew letters that are written right to left.
Bidi 表記法は双方向的な例示で使います。 小文字はラテン文字やその他の左から右に書く文字を表し、 大文字はアラビア文字やヘブライ文字のように右から左に書く文字を表します。
To denote actual octets in examples (as opposed to percent-encoded octets), the two hex digits denoting the octet are enclosed in "<" and ">". For example, the octet often denoted as 0xc9 is denoted here as <c9>.
例示中で (百分率符号化したオクテットに対して)
と >
例えば、よく 0xc9 と書かれるオクテットはここでは
<c9> と示します。
In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119].
この文書では、鍵語 しなければなりません
, 必須
, するべきではありません
, して構いません
は RFC 2119 で説明されているように解釈します。