URLパターン

URLパターン

[1] ある特定の形をしたURI参照に一致するようなパターンのようなものがいくつかのプロトコルで定義されています。 そのようなものを仮にURI パターンと呼ぶことにします。

URI パターンはこれまで色々な規格で色々な方式が提案されてきましたが、 広く採用されているものはありません。むしろ、 単純な文字列として正規表現などを使って一致させる方法が広く使われています。 ただし、完全に文字列として扱う方法では、 百分率符号化だけの差異やポート番号の有無など考慮しなければならないことが多すぎ、 厳密にしようとすると複雑でわかりにくくなってしまうという問題があります。

URL

[30] URL 同士の比較演算や正規化によって一致するかどうかを判定する構文と方法については、 URLの比較を参照。

URLpattern

[2] PICSRules では URLpattern という URI に似た文字列による表現を規定しています。 この文字列は URI と非常によく似ていますが、いくつかの部分で * が使えるなど、本来の URI とは異なる部分があります。

PICSRules Specification http://www.w3.org/TR/REC-PICSRules-971229#URLfilter

拡張 URI

[4] P3P のワイルドカード入り拡張 URI (URI パターン): P3P方針参照ファイルURI RFC 2396 の一致を記述するための ワイルドカードが使える拡張 URI を規定しています。

とまあ、本来予約されていて百分率符号化の前後で等価性が保証されない * をワイルドカードに選んでしまったために、 URI scheme に独立な処理ができなくなってしまっています。

正規表現

[5] WCAG 2.0 の以前の WD では、 XML Schema 正規表現で表された URI集合の記述を URI パターン (pattern) と呼んでいました。

その後の WD では単に正規表現と呼んでいます。

[11] Rules-based Resource Property Sets in RDF (2005-01-18 00:03:38 +09:00 版) http://www.w3.org/2004/12/q/doc/rdf-rulesets.html

Perl5 正規表現を使用。

[13] .htaccess: 正規表現を使用可能。

前方一致

[3]

[14] Service Modeling Language Interchange Format Version 1.1 ( 版) http://www.w3.org/TR/2009/REC-sml-if-20090512/#URI_prefix_matching

[40] domain="" (ダイジェスト認証) も接頭辞一致を採用しています。

[43] URLの比較も参照。

URI Templates

[29] RFC 6570URI Templates なる URL を作成するための雛形言語を規定しています。 ただしこれは URL の生成のためのものなので、本項で扱っているような URL との一致を検査するためのものではありません。

メモ

[6] SML Interchange Format Version 1.0 (2007-03-08 05:32:26 +09:00 版) http://www.w3.org/Submission/2007/SUBM-sml-if-20070321/#URI_prefix_matching (名無しさん)

[7] Protocol for Web Description Resources (POWDER): Web Description Resources Datatypes (WDRD) (2007-09-27 06:24:18 +09:00 版) http://www.w3.org/TR/2007/WD-powder-xsd-20070925/ (名無しさん)

[8] Service Modeling Language Interchange Format Version 1.1 (2007-09-27 00:24:13 +09:00 版) http://www.w3.org/TR/2007/WD-sml-if-20070926/#URI_prefix_matching (名無しさん)

[9] URISpace (2001-02-16 04:12:14 +09:00 版) http://www.w3.org/TR/2001/NOTE-urispace-20010215 (名無しさん)

[10] URI Pattern Matching for Groups of Resources (2006-06-21 03:51:25 +09:00 版) http://www.w3.org/2005/Incubator/wcl/matching.html (名無しさん)

[12] robots.txt: 当初は単純な完全一致またはディレクトリ部の一致のみだったが、後に拡張されている。

[15] ESI Invalidation Protocol 1.0 ( ( 版)) http://www.w3.org/TR/esi-invp

[16] R2RML: RDB to RDF Mapping Language ( ( 版)) http://www.w3.org/TR/2012/REC-r2rml-20120927/#from-template

[17] RFC 6415 - Web Host Metadata ( ( 版)) http://tools.ietf.org/html/rfc6415#section-3.1.1.1

[18] Extensible Resource Descriptor (XRD) Version 1.0 ( ( 版)) http://docs.oasis-open.org/xri/xrd/v1.0/xrd-1.0.html#link.attribute.template

[19] Website Parse Template ( 版) http://www.w3.org/Submission/WPT/#urls_section

[20] Protocol for Web Description Resources (POWDER): Grouping of Resources ( 版) http://www.w3.org/TR/2009/REC-powder-grouping-20090901/

[21] Protocol for Web Description Resources (POWDER): Formal Semantics ( 版) http://www.w3.org/TR/2009/REC-powder-formal-20090901/#iriSets

[22] Content Scripts - Google Chrome Extensions - Google Code ( ( 版)) http://code.google.com/chrome/extensions/content_scripts.html

[23] Match Patterns - Google Chrome Extensions - Google Code ( ( 版)) http://code.google.com/chrome/extensions/match_patterns.html

[24] URL patterns - Custom Search Help ( ( 版)) https://support.google.com/customsearch/answer/71826?hl=en

[25] Resource Description Framework (RDF) Model and Syntax Specification ( ( 版)) http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/#URIPrefix

[26] Google Developers Console Help — Google Developers ( ( 版)) https://developers.google.com/console/help/new/#whitelistingbyhost

[27] The Platform for Privacy Preferences 1.0 (P3P1.0) Specification ( ( 版)) http://www.w3.org/TR/P3P/#ref_file_wildcards

[28] The Platform for Privacy Preferences 1.0 (P3P1.0) Specification ( ( 版)) http://www.w3.org/TR/P3P/#hints

[31] OAuth 2.0 は、リダイレクトURLの登録方法の1つとして、 URL schemeauthoritypath を指定する (query素片識別子は含まない) 方法を規定しています。その場合、実際に指定された URL が登録されたものに一致するか比較することとなります。 RFC 3986 を参照しつつも、具体的な比較の演算は定義していません。

[32] mod_proxy - Apache HTTP Server Version 2.4 ( 版) http://httpd.apache.org/docs/current/en/mod/mod_proxy.html#Proxy

Syntax: <Proxy wildcard-url> ...</Proxy>

Directives placed in <Proxy> sections apply only to matching proxied content. Shell-style wildcards are allowed.

<Proxy http://example.com/foo/*>

[33] Using the Proxy API - Opera 15+ extensions documentation ( 版) https://dev.opera.com/extensions/tut_proxy.html

This list may contain the following entries:

[<scheme>://]<host-pattern>[:<port>]

Match all hostnames that match the pattern <host-pattern>. A leading "." is interpreted as a "*.".

Examples: "foobar.com", "*foobar.com", "*.foobar.com", "*foobar.com:99", "https://x.*.y.com:99".

[34] Match patterns - Opera 15+ extensions documentation ( 版) https://dev.opera.com/extensions/tut_match_patterns.html

Match pattern syntax

Here's the basic syntax:

<url-pattern> := <scheme>://<host><path>

<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp' | 'chrome-extension'

<host> := '*' | '*.' <any char except '/' and '*'>+

<path> := '/' <any chars>

The meaning of '*' depends on whether it's in the scheme, host, or path part. If the scheme is *, then it matches either http or https. If the host is just *, then it matches any host. If the host is *.hostname, then it matches the specified host or any of its subdomains. In the path section, each '*' matches 0 or more characters. The following table shows some valid patterns.

[35] 証明書拡張 name constraints における uniformResourceIdentifier

[36] Crosswalk - xwalk_hosts (Crosswalk 著, 版) https://crosswalk-project.org/documentation/manifest/xwalk_hosts.html

The field takes an array of URL patterns representing hosts which the application should be able to access. The values can be fully qualified host names, like this:

"http://crosswalk-project.org/"

Or patterns with wild-card characters, such as:

"http://*.org/"

"https://*/"

[37] <data> | Android Developers ( 版) http://developer.android.com/guide/topics/manifest/data-element.html

[38] This morning's work. · w3c/webappsec-csp@d669817 ( 版) https://github.com/w3c/webappsec-csp/commit/d6698170c8eede388bd351773dada7b633988be8

[39] oEmbed ( 版) http://oembed.com/

The URL scheme may contain one or more wildcards (specified with an asterisk). Wildcards may be present in the domain portion of the URL, or in the path. Within the domain portion, wildcards may only be used for subdomains. Wildcards may not be used in the scheme (to support HTTP and HTTPS, provide two url/endpoint pairs).

Some examples:

http://www.flickr.com/photos/* OK

http://www.flickr.com/photos/*/foo/ OK

http://*.flickr.com/photos/* OK

http://*.com/photos/* NOT OK

*://www.flickr.com/photos/* NOT OK

[41] BEACON link dump format ( 版) https://gbv.github.io/beaconspec/beacon.html

A URI pattern in this specification is an URI Template, as defined in [RFC6570], with all template expressions being either {ID} for simple string expansion or {+ID} for reserved expansion.

[42] Match Patterns - Google Chrome ( 版) https://developer.chrome.com/extensions/match_patterns

[44] cURL - How To Use ( ()) https://curl.haxx.se/docs/manpage.html

You can specify multiple URLs or parts of URLs by writing part sets within braces as in:

  http://site.{one,two,three}.com

or you can get sequences of alphanumeric series by using [] as in:

  ftp://ftp.example.com/file[1-100].txt

  ftp://ftp.example.com/file[001-100].txt (with leading zeros)

  ftp://ftp.example.com/file[a-z].txt

Nested sequences are not supported, but you can use several ones next to each other:

  http://example.com/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

You can specify any amount of URLs on the command line. They will be fetched in a sequential manner in the specified order.

You can specify a step counter for the ranges to get every Nth number or letter:

  http://example.com/file[1-100:10].txt

  http://example.com/file[a-z:2].txt

[45] cURL - How To Use ( ()) https://curl.haxx.se/docs/manpage.html#-g

-g, --globoff

This option switches off the "URL globbing parser". When you set this option, you can specify URLs that contain the letters {}[] without having them being interpreted by curl itself. Note that these letters are not normal legal URL contents but they should be encoded according to the URI standard.

[46] Match patterns - Mozilla | MDN ( ()) https://developer.mozilla.org/en-US/Add-ons/WebExtensions/Match_patterns

[47] chrome.proxy - Google Chrome ( ()) https://developer.chrome.com/extensions/proxy

Bypass list

Individual servers may be excluded from being proxied with the bypassList. This list may contain the following entries:

[<scheme>://]<host-pattern>[:<port>]

Match all hostnames that match the pattern <host-pattern>. A leading "." is interpreted as a "*.".

Examples: "foobar.com", "*foobar.com", "*.foobar.com", "*foobar.com:99", "https://x.*.y.com:99".

Pattern Matches Does not match

".foobar.com" "www.foobar.com" "foobar.com"

"*.foobar.com" "www.foobar.com" "foobar.com"

"foobar.com" "foobar.com" "www.foobar.com"

"*foobar.com" "foobar.com", "www.foobar.com", "foofoobar.com"

[<scheme>://]<ip-literal>[:<port>]

Match URLs that are IP address literals.

Conceptually this is the similar to the first case, but with special cases to handle IP literal canonicalization. For example, matching on "[0:0:0::1]" is the same as matching on "[::1]" because the IPv6 canonicalization is done internally.

Examples: "127.0.1", "[0:0::1]", "[::1]", "http://[::1]:99"

<ip-literal>/<prefix-length-in-bits>

Match any URL containing an IP literal within the given range. The IP range is specified using CIDR notation.

Examples: "192.168.1.1/16", "fefe:13::abc/33"

<local>

Match local addresses. An address is local if the host is "127.0.0.1", "::1", or "localhost".

Example: "<local>"

[48] Network Settings - The Chromium Projects ( ()) https://www.chromium.org/developers/design-documents/network-settings

--proxy-bypass-list=(<trailing_domain>|<ip-address>)[:<port>][;...]

This tells chrome to bypass any specified proxy for the given semi-colon-separated list of hosts. This flag must be used (or rather, only has an effect) in tandem with --proxy-server.

Note that trailing-domain matching doesn't require "." separators so "*google.com" will match "igoogle.com" for example.

[49] Sniffly for ports. (mikewest著, ) https://github.com/w3c/webappsec-csp/commit/22d08b990290e49f5a666fad08de16d75bb369e7

[50] Part 2.3: What is an intersection of the two source expressions? (#144) (Sun77789著, ) https://github.com/w3c/webappsec-csp/commit/5da961eb94294f739207e183d21cbe19d3516fa3

[51] RFC 8007 - Content Delivery Network Interconnection (CDNI) Control Interface / Triggers () https://tools.ietf.org/html/rfc8007#section-5.2.4

[52] Web ビュー (Jwmsft著, ) https://msdn.microsoft.com/ja-jp/windows/uwp/controls-and-patterns/web-view

また、サブドメインのワイルドカード (たとえば、https://*.microsoft.com) を含めることはできますが、ドメインのワイルドカード (たとえば、https://*.com や https://*.*) を含めることはできません。

[53] Handling App Links | Android Developers () https://developer.android.com/training/app-links/index.html

android:host attribute with a domain URI pattern

[54] Creating Safari Content-Blocking Rules () https://developer.apple.com/library/content/documentation/Extensions/Conceptual/ContentBlockingRules/CreatingRules/CreatingRules.html#//apple_ref/doc/uid/TP40016265-CH2-SW1

The url-filter string format is a strict subset of JavaScript regular expressions, shown in Table 1. Syntactically, everything supported by JavaScript is reserved but only the subset will be accepted by the parser. An unsupported expression results in a parse error.

[55] Module ngx_http_core_module () https://nginx.org/en/docs/http/ngx_http_core_module.html#location

[56] キャッシュ無効化の概要  |  Cloud CDN のドキュメント  |  Google Cloud Platform ( ()) https://cloud.google.com/cdn/docs/cache-invalidation-overview?hl=ja

無効化リクエストには、無効にする 1 つまたは複数のオブジェクトを識別するパスパターンを指定します。/cat.jpg のように特定のパスを指定することも、/pictures/* のようにディレクトリ構造全体を指定することもできます。パスパターンには次の規則が適用されます。

パスパターンは / で始める必要があります。

? 、# は使用できません。

* は、/ に続く最後の文字として使用できますが、それ以外の場所では使用できません。

/* で終わる場合、先行する文字列は接頭辞になり、その接頭辞で始まるオブジェクトはすべて無効になります。

パスパターンは、URL のパス コンポーネント(ホスト名と任意の ? または # の間にあるすべての文字列)と比較されます。

URL にクエリ文字列(例: /images.php?image=fred.png)が含まれている場合、クエリ文字列だけが異なるオブジェクトを無効にすることはできません。 たとえば、2 つの画像(/images.php?image=fred.png と /images.php?image=barney.png)がある場合、fred.png だけを無効にすることはできません。images.php で提供されるすべての画像を無効にするには、パスパターンに /images.php を使用します。

[57] Microsummary XML grammar reference - Archive of obsolete content | MDN ( ()) https://developer.mozilla.org/en-US/docs/Archive/Mozilla/Microsummary_topics/XML_grammar_reference

<include> (optional)

A regular expression matching the URLs of pages that the generator is able to summarize.

<exclude> (optional)

A regular expression matching the URLs of pages that the generator is not able to summarize.

[58] Match patterns - Mozilla | MDN () https://developer.mozilla.org/en-US/Add-ons/WebExtensions/Match_patterns

[59] Dev.Opera — Match Patterns () https://dev.opera.com/extensions/match-patterns/

[60] Access and Permissions () https://developer.apple.com/library/content/documentation/Tools/Conceptual/SafariExtensionGuide/ExtensionPermissions/ExtensionPermissions.html

[61] Why invent a new URL template syntax? · Issue #31 · WICG/web-share-target () https://github.com/WICG/web-share-target/issues/31

[62] Share - Archive of obsolete content | MDN () https://developer.mozilla.org/en-US/docs/Archive/Social_API/Share

In your manifest, you can define your shareURL in this format:

"shareURL": "https://yoursite.com/share?u=%{url}&t=%{title}

[63] Headers & Basic Auth | Netlify () https://www.netlify.com/docs/headers-and-basic-auth/

Paths can contain * or :placeholders. A :placeholder matches anything except / while a * matches anything.

[64] URLs and Hashing  |  Safe Browsing APIs (v4)  |  Google Developers () https://developers.google.com/safe-browsing/v4/urls-hashing

[46] curl - How To Use (, ) https://curl.haxx.se/docs/manpage.html#-g

[65] curl - How To Use, , https://curl.haxx.se/docs/manpage.html#-o

[66] GNU Wget 1.20 Manual () https://www.gnu.org/software/wget/manual/wget.html#index-types-of-files

[67] chrome.declarativeNetRequest - Chrome Developers, , https://developer.chrome.com/docs/extensions/reference/declarativeNetRequest/

[68] WICG/urlpattern () https://github.com/WICG/urlpattern

[75] urlpattern/explainer.md at main · WICG/urlpattern · GitHub, https://github.com/WICG/urlpattern/blob/main/explainer.md

[69] New standard: URLPattern · Issue #215 · whatwg/sg · GitHub, https://github.com/whatwg/sg/issues/215

[78] WHATWG migration · Issue #190 · WICG/urlpattern · GitHub, https://github.com/WICG/urlpattern/issues/190

[70] GitHub - pillarjs/path-to-regexp: Turn a path string such as `/user/:name` into a regular expression, https://github.com/pillarjs/path-to-regexp

[76] GitHub - wanderview/urlpattern-polyfill, https://github.com/wanderview/urlpattern-polyfill

[77] GitHub - denoland/rust-urlpattern: Rust implementation of the `URLPattern` web API, https://github.com/denoland/rust-urlpattern

[71] URLPattern · Issue #525 · web-platform-tests/interop · GitHub, https://github.com/web-platform-tests/interop/issues/525

[72] URLPattern API · Issue #61 · WebKit/standards-positions · GitHub, https://github.com/WebKit/standards-positions/issues/61

[73] Request for Position: URLPattern · Issue #566 · mozilla/standards-positions · GitHub, https://github.com/mozilla/standards-positions/issues/566

[74] Intent to Ship: URLPattern, https://groups.google.com/a/chromium.org/g/blink-dev/c/-T5pJtBO8h4/m/cAkpQec1AwAJ

[80] urlpattern/202012-update.md at main · whatwg/urlpattern · GitHub, https://github.com/whatwg/urlpattern/blob/main/202012-update.md

[79] GitHub - whatwg/urlpattern: URL Pattern Standard, https://github.com/whatwg/urlpattern

[81] URL Pattern Standard, , https://urlpattern.spec.whatwg.org/

[82] Editorial: update the standard for the WHATWG move by domenic · Pull Request #193 · whatwg/urlpattern · GitHub, https://github.com/whatwg/urlpattern/pull/193

[83] URL Fields | Elastic Common Schema (ECS) Reference [8.11] | Elastic, , https://www.elastic.co/guide/en/ecs/8.11/ecs-url.html