[342] This specification defines a set of steps that can be used to
parse feeds in the wild.
[346] This specification supports feed using following formats and modules:
[15] A feed has
entries , which is a list of entries , and
authors , which is a list of persons .
They are initially empty.
[39] A feed has
page URL ,
feed URL ,
previous feed URL ,
next feed URL ,
icon ,
logo , and
updated .
They are initially null .
[226] A feed has
title ,
subtitle , and
description .
They are initially null . They can be null , a string , or a Node
.
[28] An entry has
authors , which is a list of persons ,
categories , which is a set of strings, and
enclosures , which is a list of enclosures .
They are initially empty.
[88] An entry has
feed ,
page URL ,
thumbnail ,
duration ,
published , and
updated .
They are initially null .
[227] An entry has
title ,
summary , and
content .
They are initially null . They can be null , a string , or a Node
.
[81] To get the computed authors of an entry entry , run these steps:[82] If entry 's authors is not empty,
return entry 's authors .[83] Otherwise, if entry 's feed is not null,
return entry 's feed 's authors .[84] Otherwise, return an empty list.
[93] To get the computed updated of an entry entry , run these steps:[94] If entry 's updated is not null ,
return entry 's updated .[97] Otherwise, if entry 's published is not null ,
return entry 's published .[95] Otherwise, if entry 's feed is not null,
return entry 's feed 's updated .[96] Otherwise, returnthe current timestamp.
[29] A person is a tuple of
name ,
email ,
page URL , and
icon .
They are initially null .
[160] An image has
URL ,
width , and
height .
They are initially null .
[188] An enclosure has
URL ,
MIME type , and
length .
They are initially null .
[344] An implementation that supports this specification MUST use the steps to
process a feed response to parse a response as a feed.
[345] An implementation that supports this specification MUST use the steps to
process a feed document to parse a Document
as a feed.
[372] To process a URL element element , run these steps:[35] Let text be element 's child text content .[196] If text is the empty string , return null and abort these steps.[36] Parse text relative to element 's node document .[38] If not failed, return the resulting URL string .[373] Otherwise, return null .
[130] The link relation of a link
element
in the Atom namespace or in the Atom 0.3 namespace
element is the value returned by the following steps:[131] If element does not have a rel
attribute:[132] Return http://www.iana.org/assignments/relation/alternate
.[134] Otherwise:[135] Let rel be element 's rel
attribute value.[136] If rel contains a :
character:[137] Return rel .[138] Otherwise:[133] Return http://www.iana.org/assignments/relation/
followed by rel .
[140] The href URL of a link
element
in the Atom namespace or in the Atom 0.3 namespace
element is the value returned by the following steps:[141] Let text be element 's href
attribute value,
if any, or the empty string.[142] Parse text relative to element 's node document .[143] If failed, return null .[144] Otherwise, return the resulting URL record .
[234] To process an Atom link element for object
with type , run these steps:
[139] If element 's link relation is
http://www.iana.org/assignments/relation/alternate
:[145] If object 's page URL is null :[146] Set object 's page URL to
child 's href URL .[235] If mode is feed
:[150] If element 's link relation is
http://www.iana.org/assignments/relation/self
:[151] If object 's feed URL is null :[152] Set object 's feed URL to
child 's href URL .[199] If element 's link relation is
http://www.iana.org/assignments/relation/prev
or
http://www.iana.org/assignments/relation/previous
:[201] If object 's previous feed URL is null :[202] Set object 's previous feed URL to
child 's href URL .[203] If element 's link relation is
http://www.iana.org/assignments/relation/next
:[204] If object 's next feed URL is null :[205] Set object 's next feed URL to
child 's href URL .[269] If mode is entry
:[270] If element 's link relation is
http://www.iana.org/assignments/relation/enclosure
:[336] Let url be child 's href URL .[337] If url is not null :[271] Let enclosure be an enclosure ,
whose URL is url .[273] Set enclosure 's type to child 's
type
attribute value.[275] Let length be child 's length
attribute value.[276] If length is not null:[277] Let n be the result of applying the
rules for parsing non-negative integers to length .[278] If n is not an error and n is greater than zero:[274] Set enclosure 's length to n .[272] Append enclosure to object 's enclosures .[176] To process an image element element with string
attribute name , run these steps:
[161] Let text be element 's attribute name attribute value.[170] If text is not null or the empty string :[179] Parse text relative to element 's node document .[187] If not failed:[228] Let image be an image .[229] Set image 's URL to the resulting URL string .[297] Let w be the result of applying the
rules for parsing non-negative integers to element 's width
attribute value, if any, or the empty string.[298] If w is not an error, set image 's width
to w .[299] Let h be the result of applying the
rules for parsing non-negative integers to element 's height
attribute value, if any, or the empty string.[300] If h is not an error, set image 's height
to h .[296] Return image and abort these steps.[177] Return null .[24] To process an Atom person element , run these steps:[54] Let person be a person .[55] For each element child in element 's children , in order,
run these substeps:[56] Switch by child 's namespace and local name :name
element in the Atom namespace or in the Atom 0.3 namespace
If person 's name is null ,
set person 's name to the result of
processing a string element child . email
element in the Atom namespace or in the Atom 0.3 namespace
If person 's email is null ,
set person 's email to the result of
processing a string element child . uri
element in the Atom namespace or in the Atom 0.3 namespace
If person 's page URL is null ,
set person 's page URL to the result of
processing a URL element child . image
element in the GData namespace
If person 's icon is null ,
set person 's icon to the result of
processing an image element child
with attribute name src
. [53] Return person .
[247] To process an RSS 2.0 person element , run these steps:[254] Let person be a person .[253] Let text be element 's child text content .[318] If text is the empty string :[319] Return null .[157] Otherwise, if text is
one or more Unicode code points that are not space characters ,
followed by one or more space characters ,
followed by a (
character,
followed by one or more Unicode code points ,
followed by a )
character:[158] Set person 's name to the substring
between (
and )
characters in text , not inclusive.[159] Set person 's email to the substring
before the first space character in text .[320] Otherwise:[255] Set person 's name to text .[256] Return person .
[60] To process a string element element , run these steps:[98] Let text be element 's child text content .[99] If text contains a Unicode code point that is not a space character :[100] Return text .[164] Otherwise:[174] Return null .
[50] To process an Atom text element , run these steps:[61] Let type be the type
attribute value
of element , if any, or null .[62] If type is html
:[267] Return the result of parsing escaped HTML content of element .[63] Otherwise, if type is xhtml
and
element 's children contains a div
element:[66] Let div be a clone of element 's
first div
element child, with clone children flag set.[67] Let fragment be a DocumentFragment
whose node document is element 's node document .[68] For each child node in div 's children , in order,
insert node into fragment .[104] Sanitize fragment .[105] If fragment has significant content :[69] Return fragment .[106] Otherwise:[107] Return null .[64] Otherwise:[65] Let text be element 's child text content .[45] If text contains a Unicode code point that is not a space character :[46] Return text .[47] Otherwise:[48] Return null .
[153] To process an Atom 0.3 content element , run these steps:[155] Let mode be element 's mode
attribute value,
if any, or xml
.[154] Let type be element 's type
attribute value,
if any, or text/plain
.[175] Let text be element 's child text content .[162] If mode is escaped
:[180] If type is equal to text/html
(ASCII case-insensitive ):[268] Return the result of parsing escaped HTML content of element and abort these steps.[49] If text contains a Unicode code point that is not a space character :[57] Return text .[58] Otherwise:[59] Return null .
[178] To parse escaped HTML content of element element , run these steps:[101] Let text be element 's child text content .[73] Let div be a div
element
whose node document is element 's node document .[74] Let fragment be a DocumentFragment
whose node document is element 's node document .[70] Let nodes be the result of running the HTML fragment parsing algorithm
with context set to div and input set to text .[71] For each item node in nodes , in order,
insert node into fragment .[108] Sanitize fragment .[109] If fragment has significant content :[339] If fragment 's children contains exactly one Text
:[340] Return fragment 's children 's first item's data .[341] Otherwise:[72] Return fragment .[110] Otherwise:[111] Return null .
[112] To sanitize Node
node , run these steps:[113] If there is
an img
element whose
width
attribute value is 1
and
height
attribute value is 1
,
remove it from its parent .
[114] A Node
node has significant content if
there is a feed significant content inclusive descendant of node ,
which is not an inclusive descendant of
an element matching one of the following conditions
which is an inclusive descendant of node :
[129] A Node
is a feed significant content if
it is a palpable content and is an embedded content .
[362] Two inputs A and B are same text content
iff :both A and B are string s and they are equal, or both A and B are Node
s and they are equal .
[262] To cleanup entry entry , run these steps:[263] If entry 's page URL is not null and
there is an enclosure whose URL is equal to
entry 's page URL in
entry 's enclosures :[264] Set entry 's page URL to null .[332] If entry 's page URL is not null ,
entry 's thumbnail is null ,
and there is an enclosure whose
type starts with image/
(ASCII case-insensitive ) or
whose type is null and
URL ends by .jpeg
, .jpg
, or .png
:[334] Let enclosure be the first such enclosure .[333] Let image be an image whose URL
is enclosure 's URL .[335] Remove enclosure from entry 's enclosures .[265] If
entry 's title and
entry 's subtitle
have same text content ,
set entry 's subtitle to null .[266] If
entry 's title and
entry 's summary
have same text content ,
set entry 's summary to null .[291] If
entry 's summary and
entry 's content
have same text content ,
set entry 's content to null .
[52] To process an Atom entry element , run these steps:[78] Let entry be an entry .[79] For each element child in element 's children , in order,
run these substeps:[80] Switch by child 's namespace and local name :author
element in the Atom namespace or in the Atom 0.3 namespace
Append the result of processing an Atom person child
to entry 's authors . category
element in the Atom namespace [30] If child has a term
attribute:[44] Let term be child 's term
attribute value.[89] If term is not an empty string,
add term to entry 's categories .subject
element in the Dublin Core namespace [90] Let term be child 's child text content .[91] If term is not an empty string,
add term to entry 's categories .published
element in the Atom namespace
If entry 's published is null ,
set entry 's published to the result of
processing an Atom date child . created
element in the Atom 0.3 namespace or in the Atom namespace
If entry 's published is null ,
set entry 's published to the result of
processing a W3C-DTF date child . updated
element in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing an Atom date child . modified
element in the Atom 0.3 namespace or in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing a W3C-DTF date child . title
element in the Atom namespace
If entry 's title is null ,
set entry 's title to the result of
processing an Atom text child . title
element in the Atom 0.3 namespace
If entry 's title is null ,
set entry 's title to the result of
processing an Atom 0.3 content child . summary
element in the Atom namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing an Atom text child . summary
element in the Atom 0.3 namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing an Atom 0.3 content child . content
element in the Atom namespace
If entry 's content is null ,
set entry 's content to the result of
processing an Atom text child . content
element in the Atom 0.3 namespace
If entry 's content is null ,
set entry 's content to the result of
processing an Atom 0.3 content child . link
element in the Atom namespace or in the Atom 0.3 namespace
Process an Atom link child for entry , with type entry
. thumbnail
element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url
. group
element in the Media RSS namespace [116] For each element gc in child 's children , in order,
run these substeps:[117] Switch by gc 's namespace and local name :title
element in the Media RSS namespace
If entry 's title is null ,
set entry 's title to the result of
processing a string element gc . description
element in the Media RSS namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing a string element gc . thumbnail
element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url
. content
element in the Media RSS namespace [123] Let enclosure be an enclosure .[124] Let text be gc 's url
attribute value.[125] If text is not null or the empty string :[126] Parse text relative to gc 's node document .[127] If not failed:[163] Set enclosure 's URL to the resulting URL string .[165] Set enclosure 's type to gc 's
type
attribute value.[171] Append enclosure to object 's enclosures .[246] Return entry .
RSS 2.0 and RSS 0.9x items [249] To process an RSS 2.0 item element , run these steps:[248] Let entry be an entry .[250] For each element child in element 's children , in order,
run these substeps:[251] Switch by child 's namespace and local name :category
element in the null namespace [92] Let term be child 's child text content .[293] If term is not the empty string,
add term to entry 's categories .author
element in the null namespace [355] Let person be the result of processing an RSS 2.0 person child .[361] If person is not null :[356] If person 's email is not null ,
or if there is no person whose name is person 's name
in entry 's authors :[354] Append person to entry 's authors .creator
element in the Dublin Core namespace or author
element in the iTunes namespace [118] Let text be element 's child text content .[119] If text is not the empty string and
there is no person whose name is text
in entry 's authors :[120] Append a person whose name is text
to entry 's authors .pubDate
element in the null namespace
If entry 's published is null ,
set entry 's published to the result of
processing an RSS 2.0 date child . updated
element in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing an Atom date child . link
element in the null namespace
If entry 's page URL is null ,
set entry 's page URL to the result of
processing a URL element child . thumbnail
element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url
. image
element in the iTunes namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name href
. content
element in the Media RSS namespace [292] Let enclosure be an enclosure .[364] Let text be child 's url
attribute value.[366] If text is not null or the empty string :[367] Parse text relative to child 's node document .[368] If not failed:[369] Set enclosure 's URL to the resulting URL string .[370] Set enclosure 's type to child 's
type
attribute value.[371] Append enclosure to object 's enclosures .enclosure
element in the null namespace [279] Let enclosure be an enclosure .[280] Let text be child 's url
attribute value.[281] If text is not null or the empty string :[282] Parse text relative to child 's node document .[283] If not failed:[195] Set enclosure 's URL to the resulting URL string .[284] Set enclosure 's type to child 's
type
attribute value.[285] Let length be child 's length
attribute value.[286] If length is not null:[287] Let n be the result of applying the
rules for parsing non-negative integers to length .[288] If n is not an error and n is greater than zero:[289] Set enclosure 's length to n .[290] Append enclosure to object 's enclosures .title
element in the null namespace
If entry 's title is null ,
set entry 's title to the result of
processing a string element child . subtitle
element in the iTunes namespace
If entry 's subtitle is null ,
set entry 's subtitle to the result of
processing a string element child . description
element in the null namespace
If entry 's summary is null ,
set entry 's summary to the result of
parsing escaped HTML content of child . summary
element in the iTunes namespace
If entry 's subtitle is null ,
set entry 's subtitle to the result of
processing a string element child . encoded
element in the RSS content namespace
If entry 's content is null ,
set entry 's content to the result of
parsing escaped HTML content of child . duration
element in the iTunes namespace [218] If entry 's duration is not null :[198] Let text be child 's child text content .[212] If text is one or more ASCII digits :[217] Set entry 's duration to
the ASCII digits in text , interpreted as a decimal number.[215] Otherwise, if text is one or more ASCII digits ,
followed by a :
character,
followed by one or more ASCII digits :[223] Set m to the first sequence of ASCII digits in text ,
interpreted as a decimal number.[224] Set s to the second sequence of ASCII digits in text ,
interpreted as a decimal number.[230] Set entry 's duration to
m × 60 + s
.[216] Otherwise, if text is one or more ASCII digits ,
followed by a :
character,
followed by one or more ASCII digits ,
followed by a :
character,
followed by one or more ASCII digits :[231] Set h to the first sequence of ASCII digits in text ,
interpreted as a decimal number.[232] Set m to the second sequence of ASCII digits in text ,
interpreted as a decimal number.[233] Set s to the third sequence of ASCII digits in text ,
interpreted as a decimal number.[257] Set entry 's duration to
h × 3600 + m × 60 + s
.[252] Return entry .
[1] To process a feed response res , run these steps:[9] If res is a network error or
res 's status is not 200
,
return null and abort these steps.[2] Let type be res 's computed MIME type .[3] If type is an XML MIME type :[7] Let doc be a Document
.[6] Let parser be an XML parser associated with doc .
The XML parser MUST implement XML5 .
It MUST NOT fetch and process external entities .[8] Run parser , using res 's body as its input byte stream .
The charset
parameter value in the
Content-Type
header value of res , if any, is used as
the encoding label provided by the underlying transport.[11] Set doc 's address to
res 's url .[27] Set doc 's character encoding
to the character encoding used by parser .[5] Return the result of processing a feed document doc .[4] Otherwise, return null .
[10] To process a feed document doc , run these steps:[12] Let root be doc 's root element .[17] Switch by root :[16] If it is a feed
element in the Atom namespace or in the Atom 0.3 namespace
Let feed be the result of processing an Atom feed element root . [19] If it is an rss
element in the null namespace
Let feed be the result of processing an rss element root . [18] If it is an RDF
element in the RDF namespace
Let feed be the result of processing an RDF element root . [350] If feed is not null , cleanup feed feed .[13] Return feed .
[351] To cleanup feed feed , run these steps:[352] If feed 's logo is not null ,
feed 's icon is not null , and
feed 's logo 's URL is
feed 's icon 's URL :[353] Set feed 's icon to null .
[20] To process an Atom feed element element , run these steps:[14] Let feed be a feed .[21] For each element child in element 's children , in order,
run these substeps:[22] Switch by child 's namespace and local name :title
element in the Atom namespace
If feed 's title is null ,
set feed 's title to the result of
processing an Atom text child . title
element in the Atom 0.3 namespace
If feed 's title is null ,
set feed 's title to the result of
processing an Atom 0.3 content child . subtitle
element in the Atom namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing an Atom text child . tagline
element in the Atom 0.3 namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing an Atom 0.3 content child . updated
element in the Atom namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing an Atom date child . modified
element in the Atom 0.3 namespace or in the Atom namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing a W3C-DTF date child . link
element in the Atom namespace or in the Atom 0.3 namespace
Process an Atom link child for feed , with type feed
. icon
element in the Atom namespace [181] If feed 's icon is null :[184] Let image be an image .[183] Set image 's URL to the result of
processing a URL element child .[182] If image 's URL is not null ,
set feed 's icon to image .logo
element in the Atom namespace [185] If feed 's logo URL is null :[186] Let image be an image .[197] Set image 's URL to the result of
processing a URL element child .[225] If image 's URL is not null ,
set feed 's logo to image .author
element in the Atom namespace or in the Atom 0.3 namespace
Append the result of processing an Atom person child
to feed 's authors . entry
element in the Atom namespace or in the Atom 0.3 namespace [86] Let entry be the result of processing an Atom entry
child .[85] Set entry 's feed to feed .[258] Cleanup entry entry .[87] Append entry to feed 's entries .[25] Return feed .
RSS 2.0 and RSS 0.9x feeds [301] To process an rss element element , run these steps:[239] Let feed be a feed .[240] For each element child in element 's children , in order,
run these substeps:[302] Switch by child 's namespace and local name :channel
element in the null namespace
Process an RSS 2.0 channel element with feed . item
element in the null namespace [242] Let entry be the result of processing an RSS 2.0 item
child .[243] Set entry 's feed to feed .[259] Cleanup entry entry .[244] Append entry to feed 's entries .[245] Return feed .
[206] To process an RSS 2.0 channel element with feed feed ,
run these steps:[208] For each element child in element 's children , in order,
run these substeps:[209] Switch by child 's namespace and local name :image
element in the null namespace [211] If feed 's logo is null :[213] Let element be child 's first url
child element
in the null namespace .[214] If element is not null :[308] Let url be the result of
processing a URL element element .[37] If url is not null :[310] Let image be an image .[314] Set image 's URL to url .[309] Set feed 's logo to image .image
element in the iTunes namespace
If feed 's icon is null ,
set feed 's icon to the result of
processing an image element child
with attribute name href
. creator
element in the Dublin Core namespace or author
element in the iTunes namespace [316] Let text be element 's child text content .[317] If text is not the empty string and
there is no person whose name is text
in feed 's authors :[315] Append a person whose name is text
to feed 's authors .managingEditor
element in the null namespace [357] Let person be the result of processing an RSS 2.0 person child .[360] If person is not null :[358] If person 's email is not null ,
or if there is no person whose name is person 's name
in feed 's authors :[359] Append person to feed 's authors .pubDate
or lastBuildDate
element in the null namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing an RSS 2.0 date child . title
element in the null namespace
If feed 's title is null ,
set feed 's title to the result of
processing a string element child . subtitle
element in the iTunes namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing a string element child . description
element in the null namespace or summary
element in the iTunes namespace
If feed 's description is null ,
set feed 's description to the result of
processing a string element child . link
element in the null namespace
If feed 's page URL is null ,
set feed 's page URL to the result of
processing a URL element child . link
element in the Atom namespace
Process an Atom link child for feed , with type feed
. item
element in the null namespace [147] Let entry be the result of processing an RSS 2.0 item
child .[148] Set entry 's feed to feed .[261] Cleanup entry entry .[149] Append entry to feed 's entries .
[43] The key word MUST is defined by RFC 2119 .
[34] The terms ASCII digits and URL record
are defined by the URL Standard .
[42] The terms MIME type , computed MIME type ,
type , and
parse a MIME type are defined by the MIME Sniffing Standard .
[41] The terms response , status ,
url , and network error
are defined by the Fetch Standard .
[26] The interfaces Node
and
DocumentFragment
are defined by the DOM Standard .
[31] The terms parent , children , inclusive descendant ,
insert , remove , clone ,
equals (of Node
s),
local name , namespace , and
node document are defined by the DOM Standard .
[23] The terms
XML MIME type ,
Unicode code point , space characters , ASCII case-insensitive ,
parse a URL , resulting URL string , resulting URL string ,
rules for parsing non-negative integers ,
document's address , document's character encoding ,
palpable content , embedded content ,
child text content ,
input byte stream ,
HTML fragment parsing algorithm , HTML parser , and XML parser
are defined by the HTML Standard .
[32] The div
and img
elements are defined by the HTML Standard .