改行区切りのJSON

改行区切りのJSON

[1] 改行区切りの JSON 値の列が JSON によって表現可能なデータの交換に使われることがあります。 これを改行区切りの JSONLDJSONNDJSONJSON Lines などと呼ぶことがあります。

[85] 統一的な名前はなく色々な呼ばれ方をしています。 特に名称なしで利用されていることもあります。

[101] いろいろなアプリケーションで採用されていて、 JSON データのストリーミング転送の事実上の標準となっています。

構文

[2] 0個以上の JSON 値を改行で区切ります。

  1. ?
    1. JSON
    2. *
      1. 改行
      2. JSON

[3] 連続した改行 (= 空行) やファイル末尾の改行をどう解釈するかは、実装によります。

[4] 改行LFCRLF か、いずれでもよいかは実装によります。

[38] LF を区切りとして解釈すれば、 CRJSON空白なので、 自然に無視されます。

[5] JSON 値には字句間の空白として改行を含めることも本来は可能ですが、 本形式では使用しません。これにより、1行 = 1 JSON 値となり、処理しやすくなります。

[6] JSON 値における空白人間可読性の向上以外に意味を持たないので、 情報交換用には不要です。なお文字列中の改行JSON 値内では必ずエスケープが必要なので、 元々改行そのままの形で現れることはありません。 (ですから、改行が含まれる JSON 値から何も考えずに改行を除去すれば、 ここで使えるJSON値となります。)
[7] JSON 値内部の改行をそのままにしても、意図通り構文解析することは可能ではあります。 行指向の処理はできなくなりますが、JSON値の終わりが現れる行までを読み込んで1つの JSON値として処理するだけです。 なお Wikipedia はそのようなものも含め、区切りを含まないものを Concatenated JSON >>44 と呼んでいます。

[37] JSON 値としてすべて認めるものと、オブジェクト配列など特定の形のもののみ認めるものがあります。

[50] 文字コードとしては、 UTF-8 を使うのが普通です。

処理

[86] 入力をごとに、それぞれ JSON として復号していきます。

[87] 行指向の処理は Unix などで一般的に用いられており、 ほとんどのプログラミング言語標準ライブラリーで簡単に実装できます。

[88] JSON の処理も、特別なことはせず、単に JSON から内部データ構造に変換する一般的なライブラリーを呼び出すだけで十分です。

[89] 後は、利用する場面に応じて空行を受信した時の処理、 不正な JSON を受信した時の処理、 入力ストリームの末端に到達した時の処理を実装し忘れないように注意しましょう。

[90] こうした処理をまとめて行う専用のライブラリーもいくつもありますが、 その場その場で都度実装しても良いくらい単純な処理で済みます。

MIME 型

[51] NDJSON 仕様は MIME型として application/x-ndjson を規定しています >>23

[56] WikipediaMIME型として application/x-ldjson を使うべきと述べています >>14

[62] たまに application/json が使われることもあります。 (正しくはありません。)

実装

[10] jq は、結果が複数の JSON 値となるとき改行区切りで出力します。 ただし、標準では JSON 値に改行を含む空白を入れて人間が見やすい形にします。 -c / --compact-output オプションを指定すると、空白のない、 本項の形式となります。

[12] TwitterCRLF 区切りの JSON 値を使っています >>292ツイートLF が含まれる可能性がある (CR が含まれる可能性はない) >>292 とされていますが、文字列に生の LF が含まれることがあるのか、それとも字句間に含まれることがあるだけなのかは不明です (前者は JSON 仕様上あり得ないはずですが)。

[13] Twitter はオプション形式として、JSON値の前に JSON値の長さを十進数で出力する形式も用意しています >>292CRLF の出現を探すのではなく、指定文字列長分を読み込む方式の方が都合がいい実装のために用意されているようです。

[83] >>82 は、改行区切りにされることがよくあるが、 他の空白区切りでも (曖昧でなければ空白なしでも) 受け付ける、としています。 この形式は他の実装では Concatenated JSON と呼ばれています。

[11] その他次の実装が改行区切りの JSON 値列を使っています。

[63] いろいろな WebアプリケーションWeb API が本項の形式 (の何らかのバリエーション) を採用しています。

[57] Chrome開発者ツール応答JSON の列の時も JSON 値の配列のように表示できます。

関連

[8] JSON 値の列全体を JSON 配列とすることも可能ですが、 末尾まで読み終えてから処理するのでない (ストリーミング処理したい) 場合に扱いづらいのが問題です。

[84] JSON をストリーミング形式で構文解析することも可能で、 そのような実装も存在はしますが、本方式の方がずっと簡単に実現できます。 (ストリーミングに対応していない普通の JSON の実装を流用できます。)

[9] JSON text sequences (RFC 7464) は本項の方式から派生したもののようですが、 特殊な制御文字を使っています。本項の方式と比べてそこまで複雑でもありませんが、 メリットも特にありません。本項の方式の方が Unix その他の行指向の既存のライブラリー等との親和性は高そうです。

本項の方式と JSON text sequences を混同した解説の類もありますから、 要注意。

歴史

[81] NDJSON は、昔は LDJSON と言っていました >>80

メモ

[14] Line Delimited JSON - Wikipedia, the free encyclopedia ( 版) http://en.wikipedia.org/wiki/Line_Delimited_JSON

[15] standard content-type for "streaming JSON" (newline delimited JSON objects)? - Google グループ ( 版) https://groups.google.com/forum/#!topic/nodejs/0ohwx0vF-SY

[16] websocket - Choice of transports for JSON over TCP - Stack Overflow ( 版) http://stackoverflow.com/questions/6573870/choice-of-transports-for-json-over-tcp

[17] How We Built Filmgrain, Part 2 of 2 | Filmgrain Blog ( 版) http://blog.filmgrainapp.com/2013/07/02/how-we-built-filmgrain-part-2-of-2/

[18] ldjson-stream ( 版) https://www.npmjs.com/package/ldjson-stream

[19] maxogden/ndjson ( 版) https://github.com/maxogden/ndjson

[55] mongoimport >>53 / mongoexport >>54 における JSON とは原則として改行区切りの JSON の列のことを言っています。

[20] ld-jsonstream ( 版) https://www.npmjs.com/package/ld-jsonstream

[21] timkuijsten/node-ld-jsonstream ( 版) https://github.com/timkuijsten/node-ld-jsonstream

[22] how to parse a large, Newline-delimited JSON file by JSONStream module in node.js? - Stack Overflow ( 版) http://stackoverflow.com/questions/15121584/how-to-parse-a-large-newline-delimited-json-file-by-jsonstream-module-in-node-j

[23] NDJSON - Newline delimited JSON - Data Protocols - Open Knowledge Foundation (Open Knowledge Foundation and others 著, 版) http://dataprotocols.org/ndjson/

[24] Storing Data as Newline Delimited JSON Entries - Will Anderson (Will Anderson 著, 版) http://willi.am/blog/2014/07/16/storing-data-as-newline-delimited-json-entries/

[25] JSON Lines ( 版) http://jsonlines.org/

[26] ndjson ( 版) http://ndjson.org/

[27] Streaming API | Plotly Developers ( 版) https://plot.ly/streaming/

Once the stream has been connected over http, write to the request stream with newline separated JSON.

httpRequestSocket.write('{ "x": 3, "y": 1 }\n')

The newline is extremely important. Without this delimiter the Streaming Endpoint will not delineate your data, and will terminate the stream. You can send multiple streams to the same plot by nesting stream tokens within the corresponding data trace object. Similarly you can use the same token for multiple traces in a plot (they will show the same stream, so this is useful only in when using subplots).

[52] draft-hallambaker-jsonl-01 - JSON Log Format (JSON-L) ( ( 版)) http://tools.ietf.org/html/draft-hallambaker-jsonl-01

[28] Lesspress.net (Christopher Brown, io@henrian.com 著, 版) http://lesspress.net/#38

[29] ndjson/ndjson-spec ( 版) https://github.com/ndjson/ndjson-spec

[30] dat/rest-api.md at master · maxogden/dat ( 版) https://github.com/maxogden/dat/blob/master/docs/rest-api.md#post-apibulk

Data can be either newline-delimited JSON (with content-type application/json) or CSV (with content-type text/csv).

The response will be a stream of newline-delimited JSON objects.

[31] Bubbles: Python ETL Framework (prototype) - Open Knowledge Labs ( 版) http://okfnlabs.org/blog/2014/09/01/bubbles-python-etl.html

JSON newlline delimited file in a store represented by a directory of JSOND files

[32] logstash - open source log management ( 版) http://logstash.net/docs/1.4.2/codecs/json_lines

This codec will decode streamed JSON that is newline delimited. For decoding line-oriented JSON payload in the redis or file inputs, for example, use the json codec instead. Encoding will emit a single JSON string ending in a ‘\n’

[33] Storing alerts as JSON — OSSEC 2.8.1 documentation ( 版) http://ossec-docs.readthedocs.org/en/latest/manual/output/json-alert-log-output.html

With the json output, you can write alerts as a newline separated json file which other programs can easily consume.

[34] Class: Traject::MarcReader — Documentation for traject/traject (master) ( 版) http://www.rubydoc.info/github/traject/traject/master/Traject/MarcReader

"json" The "marc-in-json" format, encoded as newline-separated json. (synonym 'ndj'). A simplistic newline-separated json, with no comments allowed, and no unescpaed internal newlines allowed in the json objects -- we just read line by line, and assume each line is a marc-in-json.

[35] log-defer-viz - search.cpan.org ( 版) http://search.cpan.org/dist/Log-Defer-Viz/bin/log-defer-viz

INPUT FORMAT

$ log-defer-viz --input-format=json ## default is newline separated JSON

[36] python-stream-json ( 版) http://www.enricozini.org/2011/tips/python-stream-json/

[39] Feed exports — Scrapy 0.24.5 documentation ( 版) http://doc.scrapy.org/en/latest/topics/feed-exports.html#topics-feed-format-jsonlines

[40] Add support for the JSON lines format. [#937474] | Drupal.org ( 版) https://www.drupal.org/node/937474

[41] OpenCPU - High performance JSON streaming in R: Part 1 (Jeroen Ooms 著, 版) https://www.opencpu.org/posts/jsonlite-streaming/

The json streaming format

Because parsing huge JSON strings is difficult and inefficient, JSON streaming is done using lines of minified JSON records.

[42] bitdivine/jline ( 版) https://github.com/bitdivine/jline

[43] API Basics — Scrapinghub documentation ( 版) http://doc.scrapinghub.com/api.html#data-formats

To better support streaming with many popular JSON parsers, we provide JSON Lines format by default, but JSON and CSV are also available.

[44] JSON Streaming - Wikipedia, the free encyclopedia ( 版) http://en.wikipedia.org/wiki/JSON_Streaming

[45] Streaming | Flowdock API ( 版) https://www.flowdock.com/api/streaming

JSON Stream (application/json)

The JSON stream returns messages as JSON objects that are delimited by carriage return (\r). Newline characters (\n) may occur in messages, but carriage returns should not.

Parsers must be tolerant of occasional extra newline characters placed between messages. These characters are sent as periodic "keep-alive" messages to tell clients and NAT firewalls that the connection is still alive during low message volume periods.

[46] The JSON Streaming Record (JSRec) data format « Huy Nguyen ( 版) http://www.huyng.com/posts/json-streaming-record-data-format/

Files of this format have .jsrec as their file extension

Each line in the file is a json hash map

Empty lines and lines beginning with ‘#’ are considred comments and ignored during parsing

[47] Docker Remote API v1.17 - Docker Documentation ( 版) https://docs.docker.com/reference/api/docker_remote_api_v1.17/

Content-Type: application/json

{"stream": "Step 1..."}

{"stream": "..."}

{"error": "Error...", "errorDetail": {"code": 123, "message": "Error..."}}

[48] Extend your community - Meetup ( 版) http://www.meetup.com/meetup_api/docs/stream/2/open_events/

A response message is one HTTP chunk, the body of which is a single json object, described below, terminated by a newline.

[49] Streaming ( 版) http://developer.oanda.com/rest-live/streaming/

All data written to the stream are encoded in the JSON format. Events sent to the stream are either heartbeats (every 15 seconds) to ensure that HTTP connection remains active or transactions reporting the following events:

[58] Talk:Line Delimited JSON - Wikipedia, the free encyclopedia ( 版) http://en.wikipedia.org/wiki/Talk:Line_Delimited_JSON

[59] Catmandu::Importer::JSON - search.cpan.org ( 版) http://search.cpan.org/dist/Catmandu/lib/Catmandu/Importer/JSON.pm

multiline switches optionally between line-delimited JSON and multiline JSON or arrays. the default is line-delimited JSON.

[60] Farsight Security/Blog/NMSG and JSON encoding (Mike Schiffman 著, 版) https://www.farsightsecurity.com/Blog/20150506-mschiffm-nmsg-nmsgjsontool/

Currently, the only type of JSON njt expects and emits is newline-delimited JSON. Each record is expected to be terminated with a literal \n and no \n's can appear inside the JSON.

[61] Re: Please help with writing spec for async JSON APIs ( 版) https://www.mail-archive.com/es-discuss@mozilla.org/msg36495.html

[64] Streaming API | DataSift Developers ( 版) http://dev.datasift.com/docs/api/streaming-api

The response body contains a list of JSON objects separated by new lines.

[65] pull | DataSift Developers ( 版) http://dev.datasift.com/docs/api/rest-api/endpoints/pull

X-DataSift-Format: json_new_line

Content-Type: application/ldjson

[66] Preparing Data for BigQuery - BigQuery — Google Cloud Platform (Google 著, 版) https://cloud.google.com/bigquery/preparing-data-for-bigquery

One JSON object, including any nested/repeated fields, must appear on each line.

The following example shows sample nested/repeated data:

{"kind": "person", "fullName": "John Doe", "age": 22, "gender": "Male", "citiesLived": [{ "place": "Seattle", "numberOfYears": 5}, {"place": "Stockholm", "numberOfYears": 6}]}

{"kind": "person", "fullName": "Jane Austen", "age": 24, "gender": "Female", "citiesLived": [{"place": "Los Angeles", "numberOfYears": 2}, {"place": "Tokyo", "numberOfYears": 2}]}

[67] ニコニコ動画 『スナップショット検索API』 ガイド ( 版) http://search.nicovideo.jp/docs/api/snapshot.html#toc5

【検索レスポンス】

{"dqnid":"c0676eea-cc77-4317-b442-d626c5f34558","type":"hits","values":[{"_rowid":0,"cmsid":"sm13208019","title":"【DIVA 2nd】 鏡音八八花合戦 【EDIT PV】","view_counter":9999},{"_rowid":1,"cmsid":"sm12215733","title":"メイドイン俺でミニゲーム その5","view_counter":9997},{"_rowid":2,"cmsid":"sm3495465","title":"【初音ミクオリジナル】~プレゼント~【Independence Free】","view_counter":9997}]}

{"dqnid":"c0676eea-cc77-4317-b442-d626c5f34558","type":"stats","values":[{"_rowid":0,"service":"video","total":213353}]}

{"dqnid":"c0676eea-cc77-4317-b442-d626c5f34558","endofstream":true,"type":"hits"}

{"dqnid":"c0676eea-cc77-4317-b442-d626c5f34558","endofstream":true,"type":"stats"}

検索レスポンス仕様

改行(\n)区切りを一つのチャンクとして、いくつかのチャンクに分かれて返ってきます。

[68] thenativeweb/json-lines-client ( 版) https://github.com/thenativeweb/json-lines-client

[69] API Basics — Scrapinghub documentation ( 版) http://doc.scrapinghub.com/api.html

To better support streaming with many popular JSON parsers, we provide JSON Lines format by default, but JSON and CSV are also available.

[70] mattdesl/garnish ( 版) https://github.com/mattdesl/garnish

Typically, you would use bole or ndjson to write the content to garnish. You can also write ndjson to stdout like so:

[71] rvagg/bole ( 版) https://github.com/rvagg/bole

Newline separated JSON output to arbitrary streams

[72] JSON Lines | Hacker News ( 版) https://news.ycombinator.com/item?id=10280483

[73] Storing Data as Newline Delimited JSON Entries - Will Anderson (Will Anderson 著, 版) http://willi.am/blog/2014/07/16/storing-data-as-newline-delimited-json-entries/

[74] json(1) - JSON love for your command line ( 版) http://trentm.com/json/

"Adjacent" objects means objects separated by a newline, or by no space at all. Adjacent arrays means separate by a newline. These conditions are chosen as a balance between (a) not being ambiguous to parse with a simple regex and (b) enough to be useful for common cases.

[75] Everyone loves R markdown and Github; stories from the R Summit, day two | 4D Pie Charts ( 版) http://4dpiecharts.com/2015/06/28/everyone-loves-r-markdown-and-github-stories-from-the-r-summit-day-two/

jsonlite also supports ndjson, where each line is a JSON object. This is important for large files: you can just parse one line at a time, then return the whole thing as a list or data frame, or you can define a line specific behaviour.

[76] R: Streaming JSON input/output ( 版) http://finzi.psych.upenn.edu/library/jsonlite/html/stream_in.html

Because parsing huge JSON strings is difficult and inefficient, JSON streaming is done using lines of minified JSON records, a.k.a. ndjson. This is pretty standard: JSON databases such as dat or MongoDB use the same format to import/export datasets. Note that this means that the total stream combined is not valid JSON itself; only the individual lines are. Also note that because line-breaks are used as separators, prettified JSON is not permitted: the JSON lines must be minified.

[77] stephenplusplus/jsonl ( 版) https://github.com/stephenplusplus/jsonl

Transform a stream of JSON into a stream of Line Delimited JSON

[78] embulk-parser-jsonlをリリースしました! - Qiita ( 版) http://qiita.com/shun0102/items/8989e6ed2ee0f46a0fa9

僕がリリースしたのはjsonlということで、1行につき1つのJSONというフォーマットを読み込むためのparserです。

[79] axiaoxin/vim-json-line-format ( 版) https://github.com/axiaoxin/vim-json-line-format

[82] Orchestrate (Orchestrate 著, 版) https://orchestrate.io/docs/apiref#bulk

Or alternatively, to send a JSON stream, set the Content-Type header to "application/orchestrate-export-stream+json". JSON stream data is often sent with one JSON object per line, using newline characters to delimit adjacent objects, but this isn't strictly necessary with Orchestrate. You can format the data with arbitrary whitespace (or put them direclty adjacent to one another, as long as the stream can be parsed as a sequence of valid JSON objects.

[91] Event endpoint · Appuri ( ()) http://developer.appuri.com/v1/docs/event-endpoint

Events are sent via an HTTPS POST request with the content type set to application/x-ldjson (Line-Delimited JSON).

[92] REST API ( ()) http://quasar-analytics.org/docs/restapi/

The following values are supported for the Accept header:

Value Description

None “Human-Readable” results, one result per line. Note: not parseable as a single JSON object.

application/json Nicely formatted JSON array

application/ldjson;mode=precise One result per line

[93] REST API ( ()) http://quasar-analytics.org/docs/restapi/

Note that if you remove the Accept header, then you will receive Precise JSON, which is the default format. The response would then look like this:

{ "city": "WESTOVER AFB", "state": "MA", "pop": 1764, "loc": [ -72.558657, 42.196672 ] }

{ "city": "CUMMINGTON", "state": "MA", "pop": 1484, "loc": [ -72.905767, 42.435296 ] }

[94] trentm/node-bunyan: a simple and fast JSON logging module for node.js services ( ()) https://github.com/trentm/node-bunyan

Bunyan log records are JSON. A few fields are added automatically: "pid", "hostname", "time" and "v".

$ node hi.js

{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}

{"name":"myapp","hostname":"banana.local","pid":40161,"level":40,"lang":"fr","msg":"au revoir","time":"2013-01-04T18:46:23.853Z","v":0}

[95] trentm/node-bunyan: a simple and fast JSON logging module for node.js services ( ()) https://github.com/trentm/node-bunyan

Bunyan log output is a stream of JSON objects.

[96] AnyEvent::Handle - search.cpan.org ( ()) http://search.cpan.org/dist/AnyEvent/lib/AnyEvent/Handle.pm

A simple RPC protocol that interoperates easily with other languages is to send JSON arrays (or objects, although arrays are usually the better choice as they mimic how function argument passing works) and a newline after each JSON text:

[97] Documentation () http://docs.appbase.io/scalr/rest/intro.html#quick-start-to-the-rest-api-step-1-making-requests

In the new document update, we can see the price change (5595 -> 6034) being reflected. Subsequent changes will be streamed to the resonse as raw JSON objects. As we see, there are no delimiters between between two consecutive JSON responses.

[98] Fun hacks for faster content - JakeArchibald.com () https://jakearchibald.com/2016/fun-hacks-faster-content/#newline-delimited-json

[99] Streaming | Flowdock API () https://www.flowdock.com/api/streaming

JSON Stream (application/json)

The JSON stream returns messages as JSON objects that are delimited by carriage return (\r). Newline characters (\n) may occur in messages, but carriage returns should not.

Parsers must be tolerant of occasional extra newline characters placed between messages. These characters are sent as periodic “keep-alive” messages to tell clients and NAT firewalls that the connection is still alive during low message volume periods.

[100] Event V1/V2 (deprecated) - SendGrid Documentation | SendGrid ( (SendGrid著, )) https://sendgrid.com/docs/API_Reference/Webhooks/event_deprecated.html

Batched event POSTs have a content-type header of application/json, and contain exactly one JSON string per line, with each line representing one event. Please note that currently the POST headers define this post as application/json, though it’s not; each line is a valid JSON string, but the overall POST body is not.