ドメイン名における文字

ドメイン名における文字

[2] ドメイン名ではいろいろな文字が使えますが、厳しい制限もあります。

ドメイン名 (ASCII) の構文

[37] ドメイン名では当初 ASCII文字のみが使われていました。

[38] しかも英数字- のみに限られ、更に英字以外は位置にも制限が加わっています。 文字数制限もあります。 [ ラベル

[39] これをラベルといい、 . で連結してドメイン名となります。 ドメイン名

[48] 実際には _ もいくらか使われています。 ラベル

[40] ASCII大文字・小文字不区別です。

[41] 古い時代には大文字が好んで使われました。 近年は小文字が好んで使われます。ドメイン名に限らず見られる傾向です。

[3] RFC 882 "DOMAIN NAMES - CONCEPTS and FACILITIES", November 1983. (Obsoleted by RFC 1035)

[4] >>3
Appendix 1 - Domain Name Syntax Specification

   The preferred syntax of domain names is given by the following BNF
   rules.  Adherence to this syntax will result in fewer problems with
   many applications that use domain names (e.g., mail, TELNET).  Note
   that some applications described in [14] use domain names containing
   binary information and hence do not follow this syntax.

      <domain> ::=  <subdomain> | " "

      <subdomain> ::=  <label> | <subdomain> "." <label>

      <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

      <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

      <let-dig-hyp> ::= <let-dig> | "-"

      <let-dig> ::= <letter> | <digit>

      <letter> ::= any one of the 52 alphabetic characters A through Z
      in upper case and a through z in lower case

      <digit> ::= any one of the ten digits 0 through 9

   Note that while upper and lower case letters are allowed in domain
   names no significance is attached to the case.  That is, two names
   with the same spelling but different case are to be treated as if
   identical.

   The labels must follow the rules for ARPANET host names.  They must
   start with a letter, end with a letter or digit, and have as interior
   characters only letters, digits, and hyphen.  There are also some
   restrictions on the length.  Labels must be 63 characters or less.

   For example, the following strings identify hosts in the ARPA
   Internet:

      F.ISI.ARPA     LINKABIT-DCN5.ARPA     UCL-TAC.ARPA

[6] >>4 注意: 当時既に IN-ADDR.ARPA が使われていたわけですが。。。

[7] RFC 1035 DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION, November 1987.

[8] >>7

2.3.1. Preferred name syntax

The DNS specifications attempt to be as general as possible in the rules for constructing domain names. The idea is that the name of any existing object can be expressed as a domain name with minimal changes.

However, when assigning a domain name for an object, the prudent user will select a name which satisfies both the rules of the domain system and any existing rules for the object, whether these rules are published or implied by existing programs.

For example, when naming a mail domain, the user should satisfy both the rules of this memo and those in RFC-822. When creating a new host name, the old rules for HOSTS.TXT should be followed. This avoids problems when old software is converted to use domain names.

The following syntax will result in fewer problems with many

applications that use domain names (e.g., mail, TELNET).

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

For example, the following strings identify hosts in the Internet:

A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA

2.3.3. Character Case

For all parts of the DNS that are part of the official protocol, all comparisons between character strings (e.g., labels, domain names, etc.) are done in a case-insensitive manner. At present, this rule is in force throughout the domain system without exception. However, future additions beyond current usage may need to use the full binary octet capabilities in names, so attempts to store domain names in 7-bit ASCII or use of special bytes to terminate labels, etc., should be avoided.

When data enters the domain system, its original case should be preserved whenever possible. In certain circumstances this cannot be done. For example, if two RRs are stored in a database, one at x.y and one at X.Y, they are actually stored at the same place in the database, and hence only one casing would be preserved. The basic rule is that case can be discarded only when data is used to define structure in a database, and two names are identical when compared in a case insensitive manner.

Loss of case sensitive data must be minimized. Thus while data for x.y and X.Y may both be stored under a single location x.y or X.Y, data for a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In general, this preserves the case of the first label of a domain name, but forces standardization of interior node labels.

Systems administrators who enter data into the domain database should take care to represent the data they supply to the domain system in a case-consistent manner if their system is case-sensitive. The data distribution system in the domain system will ensure that consistent representations are preserved.

2.3.4. Size limits

Various objects and parameters in the DNS have size limits. They are listed below. Some could be easily changed, others are more fundamental.

labels 63 octets or less

names 255 octets or less

TTL positive values of a signed 32 bit number.

UDP messages 512 octets or less

3.1. Name space definitions

Domain names in messages are expressed in terms of a sequence of labels. Each label is represented as a one octet length field followed by that number of octets. Since every domain name ends with the null label of the root, a domain name is terminated by a length byte of zero. The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.

To simplify implementations, the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less.

Although labels can contain any 8 bit values in octets that make up a label, it is strongly recommended that labels follow the preferred syntax described elsewhere in this memo, which is compatible with existing host naming conventions. Name servers and resolvers must compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII with zero parity. Non-alphabetic codes must match exactly.

[10] >>9

3.1. Network name syntax

   The current syntax for network names, as defined by [RFC 952] is an
   alphanumeric string of up to 24 characters, which begins with an
   alpha, and may include "." and "-" except as first and last
   characters.  This is the format which was also used for host names
   before the DNS.  Upward compatibility with existing names might be a
   goal of any new scheme.
   However, the present syntax has been used to define a flat name
   space, and hence would prohibit the same distributed name allocation
   method used for host names.  There is some sentiment for allowing the
   NIC to continue to allocate and regulate network names, much as it
   allocates numbers, but the majority opinion favors local control of
   network names.  Although it would be possible to provide a flat space
   or a name space in which, for example, the last label of a domain
   name captured the old-style network name, any such approach would add
   complexity to the method and create different rules for network names
   and host names.
   For these reasons, we assume that the syntax of network names will be
   the same as the expanded syntax for host names permitted in [HR].
   The new syntax expands the set of names to allow leading digits, so
   long as the resulting representations do not conflict with IP
   addresses in decimal octet form.  For example, 3Com.COM and 3M.COM
   are now legal, although 26.0.0.73.COM is not.  See [HR] for details.
   The price is that network names will get as complicated as host
   names.  An administrator will be able to create network names in any
   domain under his control, and also create network number to name
   entries in IN-ADDR.ARPA domains under his control.  Thus, the name
   for the ARPANET might become NET.ARPA, ARPANET.ARPA or Arpa-
   network.MIL., depending on the preferences of the owner.
[12] 2.1 Host Names and Numbers (抜粋)
      The syntax of a legal Internet host name was specified in RFC-952
      [DNS:4].  One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal
      syntax.
      Host software MUST handle host names of up to 63 characters and
      SHOULD handle host names of up to 255 characters.

ドメイン名 (非ASCII) における構文

[42] IDNA によって非ASCII文字に拡張され、 IDN と呼ばれています。

[43] IDNASCIIドメイン名の単純な拡張ではなく、

の二重構造で実現されています。 IDN, IDNA

[46] Uラベルで利用できる文字には複雑な制限があり、 IDNA の版やUnicodeの版によっても違いがあります。 IDNA, Uラベル

HOSTS.TXT におけるドメイン名

[15] RFC 608 "HOST NAMES ON-LINE", January 10, 1974. (Obsoleted by RFC 810)

[16] >>15 (抜粋)
            in which <host-name> will be the official Host Name, a
            string obtained through negotiation between the Host and the
            NIC, governed by these constraints:
                up to 48 characters drawn from the alphabet (A-Z),
                digits (0-9), and the minus sign (-) ... specifically,
                no blank or space characters allowed;
                no distinction between upper and lower case letters;
                the first character is a letter;
                the last character is NOT a minus sign;
                no other restrictions on content or syntax.
            Note:  The Host Name may be prefixed with an Official
            Network Name of up to 24 characters enclosed in parentheses
            ().  The Network Name designates the Network in which the
            Host resides.
                (The characters used in the Network Name are drawn from
                the same character set as those in the Host Name, with
                the same constraints [except the length] as listed
                above.)
                The ASCII text file will only contain the Official
                Network name for Hosts NOT on the ARPANET; for ARPANET
                Hosts there will be no Network Name prefix.

[17] RFC 810 "DoD INTERNET HOST TABLE SPECIFICATION", 1 March 1982. (Obsoleted by RFC 952)

[18] >>17 (抜粋)
   1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
   to 24 characters drawn from the alphabet (A-Z), digits (0-9), and the
   minus sign (-) and period (.).  No blank or space characters are
   permitted as part of a name.  No distinction is made between upper
   and lower case.  The first character must be a letter.  The last
   character must not be a minus sign or period.  A host which serves as
   a GATEWAY should have "-GATEWAY" or "-GW" as part of its name.  A
   host which is a TIP or a TAC should have  "-TIP" or "-TAC" as part of
   its host name, if it is an ARPANET or DoD host.

   <name> ::= <letter>[*[<letter-or-digit-or-hyphen>]<letter-or-digit>]

[19] RFC 952 "DOD INTERNET HOST TABLE SPECIFICATION", October 1985.

[20] >>19 (抜粋)
   1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
   to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
   sign (-), and period (.).  Note that periods are only allowed when
   they serve to delimit components of "domain style names". (See
   RFC-921, "Domain Name System Implementation Schedule", for
   background).  No blank or space characters are permitted as part of a
   name. No distinction is made between upper and lower case.  The first
   character must be an alpha character.  The last character must not be
   a minus sign or period.  A host which serves as a GATEWAY should have
   "-GATEWAY" or "-GW" as part of its name.  Hosts which do not serve as
   Internet gateways should not use "-GATEWAY" and "-GW" as part of
   their names. A host which is a TAC should have "-TAC" as the last
   part of its host name, if it is a DoD host.  Single character names
   or nicknames are not allowed.

      <hname> ::= <name>*["."<name>]
      <name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]

DNS での利用

[47] DNS では一般的なドメイン名の他にもいろいろな情報を格納できるデータベースです。 DNSラベルLDHラベル以外の種別もあり、 - 以外の記号類を含めたり、バイナリーデータを含めたりもできます。 ラベル

[49] 特に _ は、 DNS の通常の名前解決以外のいろいろな処理に使うデータを格納するために使われています。

インターネットメールでの利用

[50] インターネットメールRFC 822メッセージSMTP やその他の関連プロトコルでは、 メールアドレスの一部としてドメイン名が使われています。

[51] 当初の仕様ではメッセージIDの一部としてもドメイン名が使われていました。 現在はドメイン名に近い独特の構文とされています。

[52] RFC 822メッセージでは . を区切りとする字句化の規則の対象となるため、 空白注釈. の前後に挿入できるとされていました。 RFC 2822 以後そうしたものの生成は禁止されていますが、 受信はできる必要があります。 しかしこうしたものはドメイン名ラベルの一部を構成するものとはみなされません。

[53] また、ドメイン名部分には domain-literal としてインターネットドメイン名の構文と異なるものも記述できることに (少なくても仕様上は) なっています。

[21] RFC 819 "The Domain Naming Convention for Internet User Applications", August 1982.

[22] >>21 (抜粋)
   <domain> ::= <naming-domain> | <naming-domain> "." <domain>
   <naming-domain> ::=  <simple-name> | <address>
   <simple-name> ::= <a> <ldh-str> <let-dig>
   <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
   <let-dig> ::= <a> | <d>
   <let-dig-hyp> ::= <a> | <d> | "-"
   <address> :: =  "#" <number> | "[" <dotnum> "]"
   <number> ::= <d> | <d> <number>
   <dotnum> ::= <snum> "." <snum> "." <snum> "." <snum>
   <snum> ::= one, two, or three digits representing a decimal integer
   value in the range 0 through 255
   <a> ::= any one of the 52 alphabetic characters A through Z in upper
   case and a through z in lower case
   <d> ::= any one of the ten digits 0 through 9
   The simple names that make up a domain may contain both upper and
   lower case letters (as well as digits and hyphen), but these names
   are not case sensitive.

[23] RFC 821 "SIMPLE MAIL TRANSFER PROTOCOL", August 1982. (Obsoleted by RFC 2821)

[24] >>23 (抜粋)
            <domain> ::=  <element> | <element> "." <domain>
            <element> ::= <name> | "#" <number> | "[" <dotnum> "]"
            <name> ::= <a> <ldh-str> <let-dig>
            <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
            <let-dig> ::= <a> | <d>
            <let-dig-hyp> ::= <a> | <d> | "-"
            <a> ::= any one of the 52 alphabetic characters A through Z
                      in upper case and a through z in lower case
            <d> ::= any one of the ten digits 0 through 9

[25] RFC 822 "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES", August 13, 1982. (Obsoleted by RFC 2822)

[26] >>25 (抜粋)

domain = sub-domain *("." sub-domain)

sub-domain = domain-ref / domain-literal

domain-ref = atom ; symbolic reference

[27] RFC 2821 "Simple Mail Transfer Protocol", April 2001.

[28] >>27 (抜粋)

Domain = (sub-domain 1*("." sub-domain)) / address-literal

sub-domain = Let-dig [Ldh-str]

Let-dig = ALPHA / DIGIT

Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig

[29] RFC 2822 "Internet Message Format", March 2001.

[30] >>29 (抜粋)

domain = dot-atom / domain-literal / obs-domain

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1*atext *("." 1*atext)

obs-domain = atom *("." atom)

URL における利用

[54] URL においては authority としてドメイン名を記述でき(ることがあり)ます。

[55] また、 mailto: URLメールアドレスの一部としてドメイン名を記述できるように、 URL scheme 依存の構文としてドメイン名が使えることがあります。

[56] authority としてのドメイン名は、 IDN を含む任意の文字を含められます。 ただし URL として直接記述できない場合はパーセント符号化することになります。 記述できる場合もパーセント符号化を使えます。

[57] ドメイン名の解釈は URL scheme によりますが、 https: など通常の場合はシステムの名前解決によって解釈されます。 DNS の場合もあれば、 /etc/hostsプラットフォーム依存のネットワークプロトコルが使われることもあります。

[58] 従ってインターネットドメイン名では認められない文字URL に出現することがあり、正常に使える場合があります。

[59] ただし authority の構文と処理には . によるラベルの区切りや IDNA の処理が組み込まれているので、これと矛盾する使い方はできません。 URLの構文解析

[60] mailto: の場合はメールアドレスとして解釈されるので、 インターネットメール関連仕様の規定による構文を適宜パーセント符号化したものが使えます。 authority の場合と構文や処理に違いがあり、要注意です。 他の URL scheme 固有の構文でも同様の場合があります。

[61] 歴史的には ASCIIドメイン名のみが認められており、その後非ASCII文字に拡張されました。

[31] RFC 1630 "Universal Resource Identifiers in WWW", June 1994.

[32] >>31 (抜粋)
  host                   hostname | hostnumber

  hostname               ialpha [  .  hostname ]

  ialpha                 alpha [ xalphas ]

  xalphas                xalpha [ xalphas ]

  xalpha                 alpha | digit | safe | extra | escape

  alpha                  a | b | c | d | e | f | g | h | i | j | k |
                         l | m | n | o  | p | q | r | s | t | u | v |
                         w | x | y | z | A | B | C  | D | E | F | G |
                         H | I | J | K | L | M | N | O | P |  Q | R |
                         S | T | U | V | W | X | Y | Z

  digit                  0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

  safe                   $ | - | _ | @ | . | &  | + | -

  extra                  ! | * |  " |  ' | ( | )  | ,

  escape                 % hex hex

  hex                    digit | a | b | c | d | e | f | A | B | C |
                         D | E | F

[33] RFC 1738 "Uniform Resource Locators (URL)", December 1994. (Updated by RFC 2396)

[34] >>33 (抜粋)
    host
        The fully qualified domain name of a network host, or its IP
        address as a set of four decimal digit groups separated by
        ".". Fully qualified domain names take the form as described
        in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
        [5]: a sequence of domain labels separated by ".", each domain
        label starting and ending with an alphanumerical character and
        possibly also containing "-" characters. The rightmost domain
        label will never start with a digit, though, which
        syntactically distinguishes all domain names from the IP
        addresses.

host = hostname | hostnumber

hostname = *[ domainlabel "." ] toplabel

domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit

toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit

alphadigit = alpha | digit

[35] RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax", August 1998.

[36] >>35 (抜粋)

host = hostname | IPv4address

hostname = *( domainlabel "." ) toplabel [ "." ]

domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum

toplabel = alpha | alpha *( alphanum | "-" ) alphanum

alphanum = alpha | digit

文脈

ラベル, ドメイン名

メモ

[1] とりあえず、 IDN 以前ということで。。。