[2] ドメイン名ではいろいろな文字が使えますが、厳しい制限もあります。
[37] ドメイン名では当初 ASCII文字のみが使われていました。
[38]
しかも英数字と -
のみに限られ、更に英字以外は位置にも制限が加わっています。
文字数制限もあります。
[39]
これをラベルといい、 .
で連結してドメイン名となります。
[41] 古い時代には大文字が好んで使われました。 近年は小文字が好んで使われます。ドメイン名に限らず見られる傾向です。
[3] RFC 882 "DOMAIN NAMES - CONCEPTS and FACILITIES", November 1983. (Obsoleted by RFC 1035)
Appendix 1 - Domain Name Syntax Specification The preferred syntax of domain names is given by the following BNF rules. Adherence to this syntax will result in fewer problems with many applications that use domain names (e.g., mail, TELNET). Note that some applications described in [14] use domain names containing binary information and hence do not follow this syntax. <domain> ::= <subdomain> | " " <subdomain> ::= <label> | <subdomain> "." <label> <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> <let-dig-hyp> ::= <let-dig> | "-" <let-dig> ::= <letter> | <digit> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case <digit> ::= any one of the ten digits 0 through 9 Note that while upper and lower case letters are allowed in domain names no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical. The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less. For example, the following strings identify hosts in the ARPA Internet: F.ISI.ARPA LINKABIT-DCN5.ARPA UCL-TAC.ARPA
[6]
>>4
注意: 当時既に IN-ADDR.ARPA
が使われていたわけですが。。。
[7] RFC 1035 DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION, November 1987.
2.3.1. Preferred name syntax
The DNS specifications attempt to be as general as possible in the rules for constructing domain names. The idea is that the name of any existing object can be expressed as a domain name with minimal changes.
However, when assigning a domain name for an object, the prudent user will select a name which satisfies both the rules of the domain system and any existing rules for the object, whether these rules are published or implied by existing programs.
For example, when naming a mail domain, the user should satisfy both the rules of this memo and those in RFC-822. When creating a new host name, the old rules for HOSTS.TXT should be followed. This avoids problems when old software is converted to use domain names.
The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET).
<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.
The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.
For example, the following strings identify hosts in the Internet:
A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA
2.3.3. Character Case
For all parts of the DNS that are part of the official protocol, all comparisons between character strings (e.g., labels, domain names, etc.) are done in a case-insensitive manner. At present, this rule is in force throughout the domain system without exception. However, future additions beyond current usage may need to use the full binary octet capabilities in names, so attempts to store domain names in 7-bit ASCII or use of special bytes to terminate labels, etc., should be avoided.
When data enters the domain system, its original case should be preserved whenever possible. In certain circumstances this cannot be done. For example, if two RRs are stored in a database, one at x.y and one at X.Y, they are actually stored at the same place in the database, and hence only one casing would be preserved. The basic rule is that case can be discarded only when data is used to define structure in a database, and two names are identical when compared in a case insensitive manner.
Loss of case sensitive data must be minimized. Thus while data for x.y and X.Y may both be stored under a single location x.y or X.Y, data for a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In general, this preserves the case of the first label of a domain name, but forces standardization of interior node labels.
Systems administrators who enter data into the domain database should take care to represent the data they supply to the domain system in a case-consistent manner if their system is case-sensitive. The data distribution system in the domain system will ensure that consistent representations are preserved.
2.3.4. Size limits
Various objects and parameters in the DNS have size limits. They are listed below. Some could be easily changed, others are more fundamental.
labels 63 octets or less
names 255 octets or less
TTL positive values of a signed 32 bit number.
UDP messages 512 octets or less
3.1. Name space definitions
Domain names in messages are expressed in terms of a sequence of labels. Each label is represented as a one octet length field followed by that number of octets. Since every domain name ends with the null label of the root, a domain name is terminated by a length byte of zero. The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.
To simplify implementations, the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less.
Although labels can contain any 8 bit values in octets that make up a label, it is strongly recommended that labels follow the preferred syntax described elsewhere in this memo, which is compatible with existing host naming conventions. Name servers and resolvers must compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII with zero parity. Non-alphabetic codes must match exactly.
3.1. Network name syntax
The current syntax for network names, as defined by [RFC 952] is an alphanumeric string of up to 24 characters, which begins with an alpha, and may include "." and "-" except as first and last characters. This is the format which was also used for host names before the DNS. Upward compatibility with existing names might be a goal of any new scheme.However, the present syntax has been used to define a flat name space, and hence would prohibit the same distributed name allocation method used for host names. There is some sentiment for allowing the NIC to continue to allocate and regulate network names, much as it allocates numbers, but the majority opinion favors local control of network names. Although it would be possible to provide a flat space or a name space in which, for example, the last label of a domain name captured the old-style network name, any such approach would add complexity to the method and create different rules for network names and host names.For these reasons, we assume that the syntax of network names will be the same as the expanded syntax for host names permitted in [HR]. The new syntax expands the set of names to allow leading digits, so long as the resulting representations do not conflict with IP addresses in decimal octet form. For example, 3Com.COM and 3M.COM are now legal, although 26.0.0.73.COM is not. See [HR] for details.The price is that network names will get as complicated as host names. An administrator will be able to create network names in any domain under his control, and also create network number to name entries in IN-ADDR.ARPA domains under his control. Thus, the name for the ARPANET might become NET.ARPA, ARPANET.ARPA or Arpa- network.MIL., depending on the preferences of the owner.
The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.
Host software MUST handle host names of up to 63 characters and SHOULD handle host names of up to 255 characters.
[42] IDNA によって非ASCII文字に拡張され、 IDN と呼ばれています。
[43] IDN はASCIIドメイン名の単純な拡張ではなく、
[46]
Uラベルで利用できる文字には複雑な制限があり、
IDNA の版やUnicodeの版によっても違いがあります。
HOSTS.TXT
におけるドメイン名[15] RFC 608 "HOST NAMES ON-LINE", January 10, 1974. (Obsoleted by RFC 810)
in which <host-name> will be the official Host Name, a string obtained through negotiation between the Host and the NIC, governed by these constraints:
up to 48 characters drawn from the alphabet (A-Z), digits (0-9), and the minus sign (-) ... specifically, no blank or space characters allowed;
no distinction between upper and lower case letters;
the first character is a letter;
the last character is NOT a minus sign;
no other restrictions on content or syntax.
Note: The Host Name may be prefixed with an Official Network Name of up to 24 characters enclosed in parentheses (). The Network Name designates the Network in which the Host resides.
(The characters used in the Network Name are drawn from the same character set as those in the Host Name, with the same constraints [except the length] as listed above.)
The ASCII text file will only contain the Official Network name for Hosts NOT on the ARPANET; for ARPANET Hosts there will be no Network Name prefix.
[17] RFC 810 "DoD INTERNET HOST TABLE SPECIFICATION", 1 March 1982. (Obsoleted by RFC 952)
1. A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), and the minus sign (-) and period (.). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be a letter. The last character must not be a minus sign or period. A host which serves as a GATEWAY should have "-GATEWAY" or "-GW" as part of its name. A host which is a TIP or a TAC should have "-TIP" or "-TAC" as part of its host name, if it is an ARPANET or DoD host. <name> ::= <letter>[*[<letter-or-digit-or-hyphen>]<letter-or-digit>]
[19] RFC 952 "DOD INTERNET HOST TABLE SPECIFICATION", October 1985.
1. A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See RFC-921, "Domain Name System Implementation Schedule", for background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period. A host which serves as a GATEWAY should have "-GATEWAY" or "-GW" as part of its name. Hosts which do not serve as Internet gateways should not use "-GATEWAY" and "-GW" as part of their names. A host which is a TAC should have "-TAC" as the last part of its host name, if it is a DoD host. Single character names or nicknames are not allowed. <hname> ::= <name>*["."<name>] <name> ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]
[47]
DNS では一般的なドメイン名の他にもいろいろな情報を格納できるデータベースです。
DNS のラベルは LDHラベル以外の種別もあり、
-
以外の記号類を含めたり、バイナリーデータを含めたりもできます。
[50] インターネットメールのRFC 822メッセージや SMTP やその他の関連プロトコルでは、 メールアドレスの一部としてドメイン名が使われています。
[51] 当初の仕様ではメッセージIDの一部としてもドメイン名が使われていました。 現在はドメイン名に近い独特の構文とされています。
[52]
RFC 822メッセージでは .
を区切りとする字句化の規則の対象となるため、
空白や注釈を .
の前後に挿入できるとされていました。
RFC 2822 以後そうしたものの生成は禁止されていますが、
受信はできる必要があります。
しかしこうしたものはドメイン名やラベルの一部を構成するものとはみなされません。
[53] また、ドメイン名部分には domain-literal としてインターネットのドメイン名の構文と異なるものも記述できることに (少なくても仕様上は) なっています。
[21] RFC 819 "The Domain Naming Convention for Internet User Applications", August 1982.
<domain> ::= <naming-domain> | <naming-domain> "." <domain><naming-domain> ::= <simple-name> | <address><simple-name> ::= <a> <ldh-str> <let-dig><ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str><let-dig> ::= <a> | <d><let-dig-hyp> ::= <a> | <d> | "-"<address> :: = "#" <number> | "[" <dotnum> "]"<number> ::= <d> | <d> <number><dotnum> ::= <snum> "." <snum> "." <snum> "." <snum><snum> ::= one, two, or three digits representing a decimal integer value in the range 0 through 255<a> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case<d> ::= any one of the ten digits 0 through 9The simple names that make up a domain may contain both upper and lower case letters (as well as digits and hyphen), but these names are not case sensitive.
[23] RFC 821 "SIMPLE MAIL TRANSFER PROTOCOL", August 1982. (Obsoleted by RFC 2821)
<domain> ::= <element> | <element> "." <domain><element> ::= <name> | "#" <number> | "[" <dotnum> "]"<name> ::= <a> <ldh-str> <let-dig><ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str><let-dig> ::= <a> | <d><let-dig-hyp> ::= <a> | <d> | "-"<a> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case<d> ::= any one of the ten digits 0 through 9
[25] RFC 822 "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES", August 13, 1982. (Obsoleted by RFC 2822)
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom ; symbolic reference
[27] RFC 2821 "Simple Mail Transfer Protocol", April 2001.
Domain = (sub-domain 1*("." sub-domain)) / address-literal
sub-domain = Let-dig [Ldh-str]
Let-dig = ALPHA / DIGIT
Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig
[29] RFC 2822 "Internet Message Format", March 2001.
domain = dot-atom / domain-literal / obs-domain
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
obs-domain = atom *("." atom)
[54] URL においては authority としてドメイン名を記述でき(ることがあり)ます。
[55]
また、 mailto:
URL にメールアドレスの一部としてドメイン名を記述できるように、
URL scheme 依存の構文としてドメイン名が使えることがあります。
[56] authority としてのドメイン名は、 IDN を含む任意の文字を含められます。 ただし URL として直接記述できない場合はパーセント符号化することになります。 記述できる場合もパーセント符号化を使えます。
[57]
ドメイン名の解釈は URL scheme によりますが、 https:
など通常の場合はシステムの名前解決によって解釈されます。
DNS の場合もあれば、 /etc/hosts
やプラットフォーム依存のネットワークプロトコルが使われることもあります。
[58] 従ってインターネットのドメイン名では認められない文字が URL に出現することがあり、正常に使える場合があります。
[59]
ただし authority の構文と処理には .
によるラベルの区切りや IDNA
の処理が組み込まれているので、これと矛盾する使い方はできません。
[60]
mailto:
の場合はメールアドレスとして解釈されるので、
インターネットメール関連仕様の規定による構文を適宜パーセント符号化したものが使えます。
authority の場合と構文や処理に違いがあり、要注意です。
他の URL scheme 固有の構文でも同様の場合があります。
[61] 歴史的には ASCII のドメイン名のみが認められており、その後非ASCII文字に拡張されました。
[31] RFC 1630 "Universal Resource Identifiers in WWW", June 1994.
host hostname | hostnumber hostname ialpha [ . hostname ] ialpha alpha [ xalphas ] xalphas xalpha [ xalphas ] xalpha alpha | digit | safe | extra | escape alpha a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 safe $ | - | _ | @ | . | & | + | - extra ! | * | " | ' | ( | ) | , escape % hex hex hex digit | a | b | c | d | e | f | A | B | C | D | E | F
[33] RFC 1738 "Uniform Resource Locators (URL)", December 1994. (Updated by RFC 2396)
host The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.
host = hostname | hostnumber
hostname = *[ domainlabel "." ] toplabel
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit = alpha | digit
[35] RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax", August 1998.
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
alphanum = alpha | digit