リテラル (RDF)

[1] RDF におけるリテラル (literal) は、資源でない (RDF URI参照によって表せない) 値です。

仕様書

構成要素

[6] リテラルは、次の3つの要素により構成されます >>4。

[7] 字句形 (lexical form): Unicode文字列。NFC であるべきです。
[8] データ型IRI (datatype IRI): 字句形がどうリテラル値に写像されるかを決めるデータ型の IRI。
[9] 言語タグ (language tag): BCP 47 言語タグ。空文字列ではない、整形式の言語タグでなければなりません。 >>4 これを指定すると言語タグ付き文字列となります。

[26] 字句形が NFC たるべき、という規定は、絶対的な要件ではないものの、厳しい要件には違いありません。 NFC は破壊的な操作で、これが原因で文字化けが生じるおそれもあります。従って任意の自然言語文を記述するために使うのは危険です。正規化

[10] 言語タグは、データ型IRIが http://www.w3.org/1999/02/22-rdf-syntax-ns#langString の時に限り含まれます。この場合のリテラルを特に言語タグ付き文字列といいます。

リテラル値

[11] リテラル値 (literal value) は次のように定義されています >>4。

[12] 言語タグ付き文字列の場合、字句形と言語タグをこの順に並べた組です。
[13] データ型IRIが認識されるデータ型IRIに含まれなければ、未定義です。
[14] それ以外の場合、
- [15] 字句形が字句空間内であれば、字句から値への写像を適用した結果です。
- [16] そうでなければ ill-typed であり、リテラル値はありません。

[17] >>16 は意味的不整合 (semantic inconsistency) ですが、構文的には非整形式ではありません。実装は ill-typed なリテラルも受け付けて RDFグラフを生成しなければなりません。警告を出しても構いません。 >>4

等値性

[18] RDFリテラルが等値 (equal) であるとは、その字句形、データ型IRI、言語タグがいずれも文字単位で等しいことをいいます >>4。

[19] リテラル値が等しくても、字句形が異なれば等しいリテラルではありません。

歴史

[2] RDF 1.0 ではリテラルには次の種類がありました。

[5] RDF 1.1 ではすべて型付きリテラルになっています。平リテラルは単純リテラルと言語タグ付き文字列に相当します。

[20] What’s New in RDF 1.1 (2014-02-21 12:31:39 +09:00 版) http://www.w3.org/TR/rdf11-new/#h3_literals

[21] RDF 1.0 では任意の符号位置が認められていたようですが、 RDF 1.1 では xs:string なので、 U+0000 を使えず、 U+0001-U+001F は認めなくても良いとされているようです >>20。

[22] NLP Interchange Format (NIF) 2.0 - Core Specification (Sebastian Hellmann著, 2013-12-05 10:36:30 +09:00) http://persistence.uni-leipzig.org/nlp2rdf/specification/core.html

According to the RDF 1.1 specification (3.3 Literals), RDF literals are Unicode strings, which should be in Normal Form C (NFC). In NIF, we will follow this recommendation in general. There are, however, circumstances which require the use of Normal Form D (NFD) or even NFKC or NFKD. Therefore NIF allows NFD, NFKC and NFKD, if the use case justifies the usage.
One such use case is, if a linguistic annotator has the requirement to annotate individual diacritics or parts of precomposed characters and syllables. For linguists with this use case or applicable languages, using NFD is obvious and well-justified. We will only give examples here and refer the interested reader to these three documents: Gernot Katzer's page about the Korean Writing system, Wikipedia article about the Korean Hangul, Unicode Normal Form specification.

[23] RDF Literal Direction Working Group Charter (2019-06-14 03:20:23 +09:00) https://w3c.github.io/rdf-dir-literal/draft-charter.html

[24] RDF Literals and Base Directions (2019-06-14 03:20:23 +09:00) https://w3c.github.io/rdf-dir-literal/

[25] w3c/rdf-dir-literal: Proposal to add base direction to RDF Literals (2019-06-21 14:34:06 +09:00) https://github.com/w3c/rdf-dir-literal