IETFのABNF

[1] RFC 2234 の ABNF (増補 BNF) は、非常によく用いられていた RFC 822 の ABNF を整理・拡張して独立な仕様としたものです。 RFC 2234 は IETF の提案標準となっており、新しい IETF の規格を中心によく利用されています。

[2] 仕様書:

RFC 2234 urn:ietf:rfc:2234

[12] 改訂版の RFC 4234 が発行されました。 (名無しさん 2005-10-16 05:55:41 +00:00)

[18] RFC 4234 は更に改訂され、 RFC 5234 (インターネット標準 STD 68) となりました。(差異については RFC 5234 の項を参照。)

仕様書

[26] RFC 7405 - Case-Sensitive String Support in ABNF (2014-12-24 15:03:44 +09:00 版) https://tools.ietf.org/html/rfc7405

文字列リテラル

[27] 終端である ASCII 文字列について、 RFC 2234/RFC 4234/RFC 5234 は (RFC 822 から引き継いだ) " で括る構文を規定しています。この場合、ASCII大文字・小文字不区別を表しています。

[28] "yes" は、 yes, YES, YeS などを表します。

[30] 引用符内には、ASCII印字可能文字のうち " 以外のものを0文字以上含めることができます。

"
*
1. |
  1. U+0020
  2. U+0021
  3. U+0023...U+007E
"

[29] RFC 7405 はこの構文を拡張し、直前に %i をつけてASCII大文字・小文字不区別を、 %s をつけて大文字・小文字区別を表すこととしました >>26。なおこの i と s もASCII大文字・小文字不区別です >>26。

?
1. %
2. |
  1. s
  2. S
  3. i
  4. I
引用符で括られた文字列

他の仕様書の参照

[34] 他の仕様書で定義された生成規則を参照する時に、自然言語による説明を表す < ... > 構文を使うことがあります。

[33] RFC 7838 - HTTP Alternative Services (2016-04-17 23:25:39 +09:00 版) https://tools.ietf.org/html/rfc7838

OWS = <OWS, see [RFC7230], Section 3.2.3>
delta-seconds = <delta-seconds; see [RFC7234], Section 1.2.1>

[35] 記述方法に特に規定はなく、 RFC 7838 の例 (>>33) のように同じ文書でも表記が揺れているものもあったりします。

空文字列

[13] RFC 3986 では空文字列たることを明示するために 0<pchar> のような表現を使っています。

変種

[5] RFC 2234 では二重引用符で括った引用文字列は大文字・小文字を区別しないと定義していますが、その解釈を変えて、大文字・小文字は区別するとしている仕様書もあります。

[6] P3P で使われている ABNF は RFC 2234 の ABNF の変種です。詳しくは P3PのABNF の項を参照してください。

[10] RFC 3080 >>17 と RFC 3391 では 0..2147483647 のように、整数の範囲指定が使われています。

[16] ちなみに、 RFC 3391 は MIME 式の ABNF も使っています。

[9] RFC 3981 など IRIS RFC では

unreserved         = // as specified by RFC2396

... のように他の規格で定義されていることを表す表現が (undocumented で) 使われています。

[21] RFC 3080 では

   mapping    = ;; each transport mapping may define additional frames

... のような表記も使われています >>17。

[14] 変種については ABNF の項も参照してください。

[17] RFC 3080 - The Blocks Extensible Exchange Protocol Core (2014-08-17 14:28:11 +09:00 版) http://tools.ietf.org/html/rfc3080#section-2.2.1

[25] 媒体素片の仕様書 >>24 は ABNF のような方法で構文を記述していますが、それが何であるのかは明記していません。ただ RFC 3986 を参照しており、 IETF の ABNF との類似性を感じさせます。

name = fragment - "&" - "="

... のような構文により使用できない文字を記述しているようです (が説明がありません)。 - の左辺は文字列で、右辺が文字であることにも注意が必要です。 (右辺の文字列が認められないのではなく、右辺の文字を含む文字列が認められないと解釈するのが自然でしょう。)

[24] Media Fragments URI 1.0 (basic) (2012-09-27 23:08:40 +09:00 版) http://www.w3.org/TR/media-frags/#general-structure

意味的な変種

文字コード

[3] RFC 2234 では ABNF によって定義されるものが (US-ASCII を用いた) オクテット列またはビット列としています。しかし、符号化文字集合とは独立に文字列だけを定義したい場合や、符号化文字集合が ASCII 以外である場合にも ABNF を使いたいという要求があります。そのため、 RFC 2234 を参照しながらもこの辺の解釈を変えている仕様書も少なからずあります。

[4] RFC 3986 (URI 4^th) は、 URI は符号化からは独立した文字列として定義されるとしています。 RFC 3986 における ABNF は、それによって定義されるオクテット列 (の集合) に ASCII で対応する文字列 (の集合) であるとしています。

[8] 同様に RFC 3987 (IRI) は ABNF の終端を UCS としています。

[6] RFC 3676 (text/plain; format=flowed) は charset 引数によって実際のオクテット列は変わってくることを指摘しています (が不完全です)。

[20] HTML5 は「文字集合は Unicode」としつつ ABNF を利用しています。

HTML 5 (2009-08-01 09:07:43 +09:00 版) http://www.whatwg.org/specs/web-apps/current-work/#inline-documentation-for-external-scripts

[7] この他、元の仕様が厳密に意味的に RFC 2234 に従って US-ASCII のオクテット列として定義していても、それを引用している仕様がそうでなかったりすることもあります。例えば RFC 3066 の言語札の構文は RFC 2234 の ABNF で記述されていますが、言語札を採用している仕様がすべて US-ASCII のオクテット列による表現を使っているわけではありません。ほとんどの場合、そのような違いは読めばわかるからか明確に規定されていることはありません。

[15] RFC 4646 (言語札) は RFC 4234 を参照していますが、終端はオクテットではなく、文字だとしています。

空白の省略

[22] 古くからの慣習で、空白 (LWSP や FWS など) の挿入を省略しているものもあります。そのような仕様の中には、どこに空白が挿入されるのかを明記していないものも少なくありません。

[23] 例えば RFC 6690 は空白を構文に明記していません。

詐称

[19] W3C の仕様書では RFC 2234のABNF であるとしながらも実際には XMLのEBNF を採用していることがあります。 XMLのEBNF や LEIRIのABNF の項をご覧ください。

歴史

[49] RFC 2234: Augmented BNF for Syntax Specifications: ABNF, 2021-08-17T11:47:02.000Z https://www.rfc-editor.org/rfc/rfc2234
[48] RFC Errata Report » RFC Editor, 2021-08-17T11:46:24.000Z https://www.rfc-editor.org/errata/rfc2234

[47] RFC Errata Report » RFC Editor, 2021-08-17T11:45:09.000Z https://www.rfc-editor.org/errata/rfc4234

[38] Diff: rfc4234.txt - rfc5234.txt (2008-09-12 22:50:54 +09:00 版) http://tools.ietf.org/tools/rfcdiff/rfcdiff.pyht?url1=http%3A%2F%2Fwww.ietf.org%2Frfc%2Frfc4234.txt&url2=http%3A%2F%2Fwww.ietf.org%2Frfc%2Frfc5234.txt

[39] >>38 RFC 4234 からの変更点は、編集上の変更の他は、 LWSP は RFC 2822 では使われておらず、相互運用性上の問題が生じる虞があるので使うなという注意書きが加わったくらいのようです。

[40] RFC 5234 で ABNF は IETF インターネット標準 (STD 68) になりました。

メモ

[11] RFC 2234 ABNF を使う場合で、終端として文字列表記 (引用文字列) を使う時は、大文字・小文字の区別の有無を (区別ありであれ、なしであれ) 明記しておいた方が安全です。

[31] RFC 6455 - The WebSocket Protocol (2015-03-11 20:42:50 +09:00 版) http://tools.ietf.org/html/rfc6455#section-5.2

the ABNF in this section is
operating on groups of bits. The length of each group of bits is
indicated in a comment. When encoded on the wire, the most
significant bit is the leftmost in the ABNF

[32] RFC 6455 - The WebSocket Protocol (2015-03-11 20:42:50 +09:00 版) http://tools.ietf.org/html/rfc6455#section-5.2

The base framing protocol is formally defined by the following ABNF
[RFC5234]. It is important to note that the representation of this
data is binary, not ASCII characters. As such, a field with a length
of 1 bit that takes values %x0 / %x1 is represented as a single bit
whose value is 0 or 1, not a full byte (octet) that stands for the
characters "0" or "1" in the ASCII encoding. A field with a length
of 4 bits with values between %x0-F again is represented by 4 bits,
again NOT by an ASCII character or full byte (octet) with these
values. [RFC5234] does not specify a character encoding: "Rules
resolve into a string of terminal values, sometimes called
characters. In ABNF, a character is merely a non-negative integer.
In certain contexts, a specific mapping (encoding) of values into a
character set (such as ASCII) will be specified." Here, the
specified encoding is a binary encoding where each terminal value is
encoded in the specified number of bits, which varies for each field.

[36] RFC 7950 - The YANG 1.1 Data Modeling Language (2016-09-15 03:32:37 +09:00) https://tools.ietf.org/html/rfc7950#page-206

[37] RFC 3651 - Handle System Namespace and Service Definition (2016-11-06 19:13:47 +09:00) https://tools.ietf.org/html/rfc3651

<Handle> = <NamingAuthority> "/" <LocalName>
<NamingAuthority> = *(<NamingAuthority> ".") <NAsegment>

[41] draft-seantek-unicode-in-abnf-03 - Unicode in ABNF (2017-05-09 16:44:43 +09:00) https://tools.ietf.org/html/draft-seantek-unicode-in-abnf-03

[42] RFC 5323 - Web Distributed Authoring and Versioning (WebDAV) SEARCH (2017-10-01 13:49:21 +09:00) https://tools.ietf.org/html/rfc5323#section-5.15.1

(Note that the ABNF above is defined in terms of Unicode code points
([UNICODE5]); when a query is transmitted as an XML document over
WebDAV, these characters are typically encoded in UTF-8 or UTF-16.)

[43] Editorial: use %s ABNF notation (annevk著, 2018-12-12 20:25:58 +09:00) https://github.com/whatwg/fetch/commit/e69e9c2b73b1aac124de47e8f32ee8979dfdb77a

[44] Use ABNF "%s" notation for case-sensitive values · Issue #845 · whatwg/fetch (2019-05-31 14:48:43 +09:00) https://github.com/whatwg/fetch/issues/845

[45] Use "%s" notation for case-sensitive strings in ABNF (#133) by reschke · Pull Request #184 · httpwg/http-core (2019-05-31 14:48:59 +09:00) https://github.com/httpwg/http-core/pull/184

[46] Editorial: use %s ABNF notation by annevk · Pull Request #850 · whatwg/fetch (2019-05-31 14:49:13 +09:00) https://github.com/whatwg/fetch/pull/850