[342] This specification defines a set of steps that can be used to
parse feeds in the wild.
[346] This specification supports feed using following formats and modules:
[15] A feed has
entries , which is a list of entries , and
authors , which is a list of persons .
They are initially empty.
[39] A feed has
page URL ,
feed URL ,
previous feed URL ,
next feed URL ,
icon ,
logo , and
updated .
They are initially null .
[226] A feed has
title ,
subtitle , and
description .
They are initially null . They can be null , a string , or a Node .
[28] An entry has
authors , which is a list of persons ,
categories , which is a set of strings, and
enclosures , which is a list of enclosures .
They are initially empty.
[88] An entry has
feed ,
page URL ,
thumbnail ,
duration ,
published , and
updated .
They are initially null .
[227] An entry has
title ,
summary , and
content .
They are initially null . They can be null , a string , or a Node .
[81] To get the computed authors of an entry entry , run these steps:[82] If entry 's authors is not empty,
return entry 's authors .[83] Otherwise, if entry 's feed is not null,
return entry 's feed 's authors .[84] Otherwise, return an empty list.
[93] To get the computed updated of an entry entry , run these steps:[94] If entry 's updated is not null ,
return entry 's updated .[97] Otherwise, if entry 's published is not null ,
return entry 's published .[95] Otherwise, if entry 's feed is not null,
return entry 's feed 's updated .[96] Otherwise, returnthe current timestamp.
[29] A person is a tuple of
name ,
email ,
page URL , and
icon .
They are initially null .
[160] An image has
URL ,
width , and
height .
They are initially null .
[188] An enclosure has
URL ,
MIME type , and
length .
They are initially null .
[344] An implementation that supports this specification MUST use the steps to
process a feed response to parse a response as a feed.
[345] An implementation that supports this specification MUST use the steps to
process a feed document to parse a Document as a feed.
[372] To process a URL element element , run these steps:[35] Let text be element 's child text content .[196] If text is the empty string , return null and abort these steps.[36] Parse text relative to element 's node document .[38] If not failed, return the resulting URL string .[373] Otherwise, return null .
[130] The link relation of a link element
in the Atom namespace or in the Atom 0.3 namespace
element is the value returned by the following steps:[131] If element does not have a rel attribute:[132] Return http://www.iana.org/assignments/relation/alternate .[134] Otherwise:[135] Let rel be element 's rel attribute value.[136] If rel contains a : character:[137] Return rel .[138] Otherwise:[133] Return http://www.iana.org/assignments/relation/
followed by rel .
[140] The href URL of a link element
in the Atom namespace or in the Atom 0.3 namespace
element is the value returned by the following steps:[141] Let text be element 's href attribute value,
if any, or the empty string.[142] Parse text relative to element 's node document .[143] If failed, return null .[144] Otherwise, return the resulting URL record .
[234] To process an Atom link element for object
with type , run these steps:
[139] If element 's link relation is
http://www.iana.org/assignments/relation/alternate :[145] If object 's page URL is null :[146] Set object 's page URL to
child 's href URL .[235] If mode is feed :[150] If element 's link relation is
http://www.iana.org/assignments/relation/self :[151] If object 's feed URL is null :[152] Set object 's feed URL to
child 's href URL .[199] If element 's link relation is
http://www.iana.org/assignments/relation/prev or
http://www.iana.org/assignments/relation/previous :[201] If object 's previous feed URL is null :[202] Set object 's previous feed URL to
child 's href URL .[203] If element 's link relation is
http://www.iana.org/assignments/relation/next :[204] If object 's next feed URL is null :[205] Set object 's next feed URL to
child 's href URL .[269] If mode is entry :[270] If element 's link relation is
http://www.iana.org/assignments/relation/enclosure :[336] Let url be child 's href URL .[337] If url is not null :[271] Let enclosure be an enclosure ,
whose URL is url .[273] Set enclosure 's type to child 's
type attribute value.[275] Let length be child 's length attribute value.[276] If length is not null:[277] Let n be the result of applying the
rules for parsing non-negative integers to length .[278] If n is not an error and n is greater than zero:[274] Set enclosure 's length to n .[272] Append enclosure to object 's enclosures .[176] To process an image element element with string
attribute name , run these steps:
[161] Let text be element 's attribute name attribute value.[170] If text is not null or the empty string :[179] Parse text relative to element 's node document .[187] If not failed:[228] Let image be an image .[229] Set image 's URL to the resulting URL string .[297] Let w be the result of applying the
rules for parsing non-negative integers to element 's width
attribute value, if any, or the empty string.[298] If w is not an error, set image 's width
to w .[299] Let h be the result of applying the
rules for parsing non-negative integers to element 's height
attribute value, if any, or the empty string.[300] If h is not an error, set image 's height
to h .[296] Return image and abort these steps.[177] Return null .[24] To process an Atom person element , run these steps:[54] Let person be a person .[55] For each element child in element 's children , in order,
run these substeps:[56] Switch by child 's namespace and local name :name element in the Atom namespace or in the Atom 0.3 namespace
If person 's name is null ,
set person 's name to the result of
processing a string element child . email element in the Atom namespace or in the Atom 0.3 namespace
If person 's email is null ,
set person 's email to the result of
processing a string element child . uri element in the Atom namespace or in the Atom 0.3 namespace
If person 's page URL is null ,
set person 's page URL to the result of
processing a URL element child . image element in the GData namespace
If person 's icon is null ,
set person 's icon to the result of
processing an image element child
with attribute name src . [53] Return person .
[247] To process an RSS 2.0 person element , run these steps:[254] Let person be a person .[253] Let text be element 's child text content .[318] If text is the empty string :[319] Return null .[157] Otherwise, if text is
one or more Unicode code points that are not space characters ,
followed by one or more space characters ,
followed by a ( character,
followed by one or more Unicode code points ,
followed by a ) character:[158] Set person 's name to the substring
between ( and ) characters in text , not inclusive.[159] Set person 's email to the substring
before the first space character in text .[320] Otherwise:[255] Set person 's name to text .[256] Return person .
[60] To process a string element element , run these steps:[98] Let text be element 's child text content .[99] If text contains a Unicode code point that is not a space character :[100] Return text .[164] Otherwise:[174] Return null .
[50] To process an Atom text element , run these steps:[61] Let type be the type attribute value
of element , if any, or null .[62] If type is html :[267] Return the result of parsing escaped HTML content of element .[63] Otherwise, if type is xhtml and
element 's children contains a div element:[66] Let div be a clone of element 's
first div element child, with clone children flag set.[67] Let fragment be a DocumentFragment
whose node document is element 's node document .[68] For each child node in div 's children , in order,
insert node into fragment .[104] Sanitize fragment .[105] If fragment has significant content :[69] Return fragment .[106] Otherwise:[107] Return null .[64] Otherwise:[65] Let text be element 's child text content .[45] If text contains a Unicode code point that is not a space character :[46] Return text .[47] Otherwise:[48] Return null .
[153] To process an Atom 0.3 content element , run these steps:[155] Let mode be element 's mode attribute value,
if any, or xml .[154] Let type be element 's type attribute value,
if any, or text/plain .[175] Let text be element 's child text content .[162] If mode is escaped :[180] If type is equal to text/html (ASCII case-insensitive ):[268] Return the result of parsing escaped HTML content of element and abort these steps.[49] If text contains a Unicode code point that is not a space character :[57] Return text .[58] Otherwise:[59] Return null .
[178] To parse escaped HTML content of element element , run these steps:[101] Let text be element 's child text content .[73] Let div be a div element
whose node document is element 's node document .[74] Let fragment be a DocumentFragment
whose node document is element 's node document .[70] Let nodes be the result of running the HTML fragment parsing algorithm
with context set to div and input set to text .[71] For each item node in nodes , in order,
insert node into fragment .[108] Sanitize fragment .[109] If fragment has significant content :[339] If fragment 's children contains exactly one Text :[340] Return fragment 's children 's first item's data .[341] Otherwise:[72] Return fragment .[110] Otherwise:[111] Return null .
[112] To sanitize Node node , run these steps:[113] If there is
an img element whose
width attribute value is 1 and
height attribute value is 1 ,
remove it from its parent .
[114] A Node node has significant content if
there is a feed significant content inclusive descendant of node ,
which is not an inclusive descendant of
an element matching one of the following conditions
which is an inclusive descendant of node :
[129] A Node is a feed significant content if
it is a palpable content and is an embedded content .
[362] Two inputs A and B are same text content
iff :both A and B are string s and they are equal, or both A and B are Node s and they are equal .
[262] To cleanup entry entry , run these steps:[263] If entry 's page URL is not null and
there is an enclosure whose URL is equal to
entry 's page URL in
entry 's enclosures :[264] Set entry 's page URL to null .[332] If entry 's page URL is not null ,
entry 's thumbnail is null ,
and there is an enclosure whose
type starts with image/ (ASCII case-insensitive ) or
whose type is null and
URL ends by .jpeg , .jpg , or .png :[334] Let enclosure be the first such enclosure .[333] Let image be an image whose URL
is enclosure 's URL .[335] Remove enclosure from entry 's enclosures .[265] If
entry 's title and
entry 's subtitle
have same text content ,
set entry 's subtitle to null .[266] If
entry 's title and
entry 's summary
have same text content ,
set entry 's summary to null .[291] If
entry 's summary and
entry 's content
have same text content ,
set entry 's content to null .
[52] To process an Atom entry element , run these steps:[78] Let entry be an entry .[79] For each element child in element 's children , in order,
run these substeps:[80] Switch by child 's namespace and local name :author element in the Atom namespace or in the Atom 0.3 namespace
Append the result of processing an Atom person child
to entry 's authors . category element in the Atom namespace [30] If child has a term attribute:[44] Let term be child 's term attribute value.[89] If term is not an empty string,
add term to entry 's categories .subject element in the Dublin Core namespace [90] Let term be child 's child text content .[91] If term is not an empty string,
add term to entry 's categories .published element in the Atom namespace
If entry 's published is null ,
set entry 's published to the result of
processing an Atom date child . created element in the Atom 0.3 namespace or in the Atom namespace
If entry 's published is null ,
set entry 's published to the result of
processing a W3C-DTF date child . updated element in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing an Atom date child . modified element in the Atom 0.3 namespace or in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing a W3C-DTF date child . title element in the Atom namespace
If entry 's title is null ,
set entry 's title to the result of
processing an Atom text child . title element in the Atom 0.3 namespace
If entry 's title is null ,
set entry 's title to the result of
processing an Atom 0.3 content child . summary element in the Atom namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing an Atom text child . summary element in the Atom 0.3 namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing an Atom 0.3 content child . content element in the Atom namespace
If entry 's content is null ,
set entry 's content to the result of
processing an Atom text child . content element in the Atom 0.3 namespace
If entry 's content is null ,
set entry 's content to the result of
processing an Atom 0.3 content child . link element in the Atom namespace or in the Atom 0.3 namespace
Process an Atom link child for entry , with type entry . thumbnail element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url . group element in the Media RSS namespace [116] For each element gc in child 's children , in order,
run these substeps:[117] Switch by gc 's namespace and local name :title element in the Media RSS namespace
If entry 's title is null ,
set entry 's title to the result of
processing a string element gc . description element in the Media RSS namespace
If entry 's summary is null ,
set entry 's summary to the result of
processing a string element gc . thumbnail element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url . content element in the Media RSS namespace [123] Let enclosure be an enclosure .[124] Let text be gc 's url attribute value.[125] If text is not null or the empty string :[126] Parse text relative to gc 's node document .[127] If not failed:[163] Set enclosure 's URL to the resulting URL string .[165] Set enclosure 's type to gc 's
type attribute value.[171] Append enclosure to object 's enclosures .[246] Return entry .
RSS 2.0 and RSS 0.9x items [249] To process an RSS 2.0 item element , run these steps:[248] Let entry be an entry .[250] For each element child in element 's children , in order,
run these substeps:[251] Switch by child 's namespace and local name :category element in the null namespace [92] Let term be child 's child text content .[293] If term is not the empty string,
add term to entry 's categories .author element in the null namespace [355] Let person be the result of processing an RSS 2.0 person child .[361] If person is not null :[356] If person 's email is not null ,
or if there is no person whose name is person 's name
in entry 's authors :[354] Append person to entry 's authors .creator element in the Dublin Core namespace or author element in the iTunes namespace [118] Let text be element 's child text content .[119] If text is not the empty string and
there is no person whose name is text
in entry 's authors :[120] Append a person whose name is text
to entry 's authors .pubDate element in the null namespace
If entry 's published is null ,
set entry 's published to the result of
processing an RSS 2.0 date child . updated element in the Atom namespace
If entry 's updated is null ,
set entry 's updated to the result of
processing an Atom date child . link element in the null namespace
If entry 's page URL is null ,
set entry 's page URL to the result of
processing a URL element child . thumbnail element in the Media RSS namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name url . image element in the iTunes namespace
If entry 's thumbnail is null ,
set entry 's thumbnail to the result of
processing an image element child
with attribute name href . content element in the Media RSS namespace [292] Let enclosure be an enclosure .[364] Let text be child 's url attribute value.[366] If text is not null or the empty string :[367] Parse text relative to child 's node document .[368] If not failed:[369] Set enclosure 's URL to the resulting URL string .[370] Set enclosure 's type to child 's
type attribute value.[371] Append enclosure to object 's enclosures .enclosure element in the null namespace [279] Let enclosure be an enclosure .[280] Let text be child 's url attribute value.[281] If text is not null or the empty string :[282] Parse text relative to child 's node document .[283] If not failed:[195] Set enclosure 's URL to the resulting URL string .[284] Set enclosure 's type to child 's
type attribute value.[285] Let length be child 's length attribute value.[286] If length is not null:[287] Let n be the result of applying the
rules for parsing non-negative integers to length .[288] If n is not an error and n is greater than zero:[289] Set enclosure 's length to n .[290] Append enclosure to object 's enclosures .title element in the null namespace
If entry 's title is null ,
set entry 's title to the result of
processing a string element child . subtitle element in the iTunes namespace
If entry 's subtitle is null ,
set entry 's subtitle to the result of
processing a string element child . description element in the null namespace
If entry 's summary is null ,
set entry 's summary to the result of
parsing escaped HTML content of child . summary element in the iTunes namespace
If entry 's subtitle is null ,
set entry 's subtitle to the result of
processing a string element child . encoded element in the RSS content namespace
If entry 's content is null ,
set entry 's content to the result of
parsing escaped HTML content of child . duration element in the iTunes namespace [218] If entry 's duration is not null :[198] Let text be child 's child text content .[212] If text is one or more ASCII digits :[217] Set entry 's duration to
the ASCII digits in text , interpreted as a decimal number.[215] Otherwise, if text is one or more ASCII digits ,
followed by a : character,
followed by one or more ASCII digits :[223] Set m to the first sequence of ASCII digits in text ,
interpreted as a decimal number.[224] Set s to the second sequence of ASCII digits in text ,
interpreted as a decimal number.[230] Set entry 's duration to
m × 60 + s .[216] Otherwise, if text is one or more ASCII digits ,
followed by a : character,
followed by one or more ASCII digits ,
followed by a : character,
followed by one or more ASCII digits :[231] Set h to the first sequence of ASCII digits in text ,
interpreted as a decimal number.[232] Set m to the second sequence of ASCII digits in text ,
interpreted as a decimal number.[233] Set s to the third sequence of ASCII digits in text ,
interpreted as a decimal number.[257] Set entry 's duration to
h × 3600 + m × 60 + s .[252] Return entry .
[1] To process a feed response res , run these steps:[9] If res is a network error or
res 's status is not 200 ,
return null and abort these steps.[2] Let type be res 's computed MIME type .[3] If type is an XML MIME type :[7] Let doc be a Document .[6] Let parser be an XML parser associated with doc .
The XML parser MUST implement XML5 .
It MUST NOT fetch and process external entities .[8] Run parser , using res 's body as its input byte stream .
The charset parameter value in the
Content-Type header value of res , if any, is used as
the encoding label provided by the underlying transport.[11] Set doc 's address to
res 's url .[27] Set doc 's character encoding
to the character encoding used by parser .[5] Return the result of processing a feed document doc .[4] Otherwise, return null .
[10] To process a feed document doc , run these steps:[12] Let root be doc 's root element .[17] Switch by root :[16] If it is a feed element in the Atom namespace or in the Atom 0.3 namespace
Let feed be the result of processing an Atom feed element root . [19] If it is an rss element in the null namespace
Let feed be the result of processing an rss element root . [18] If it is an RDF element in the RDF namespace
Let feed be the result of processing an RDF element root . [350] If feed is not null , cleanup feed feed .[13] Return feed .
[351] To cleanup feed feed , run these steps:[352] If feed 's logo is not null ,
feed 's icon is not null , and
feed 's logo 's URL is
feed 's icon 's URL :[353] Set feed 's icon to null .
[20] To process an Atom feed element element , run these steps:[14] Let feed be a feed .[21] For each element child in element 's children , in order,
run these substeps:[22] Switch by child 's namespace and local name :title element in the Atom namespace
If feed 's title is null ,
set feed 's title to the result of
processing an Atom text child . title element in the Atom 0.3 namespace
If feed 's title is null ,
set feed 's title to the result of
processing an Atom 0.3 content child . subtitle element in the Atom namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing an Atom text child . tagline element in the Atom 0.3 namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing an Atom 0.3 content child . updated element in the Atom namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing an Atom date child . modified element in the Atom 0.3 namespace or in the Atom namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing a W3C-DTF date child . link element in the Atom namespace or in the Atom 0.3 namespace
Process an Atom link child for feed , with type feed . icon element in the Atom namespace [181] If feed 's icon is null :[184] Let image be an image .[183] Set image 's URL to the result of
processing a URL element child .[182] If image 's URL is not null ,
set feed 's icon to image .logo element in the Atom namespace [185] If feed 's logo URL is null :[186] Let image be an image .[197] Set image 's URL to the result of
processing a URL element child .[225] If image 's URL is not null ,
set feed 's logo to image .author element in the Atom namespace or in the Atom 0.3 namespace
Append the result of processing an Atom person child
to feed 's authors . entry element in the Atom namespace or in the Atom 0.3 namespace [86] Let entry be the result of processing an Atom entry
child .[85] Set entry 's feed to feed .[258] Cleanup entry entry .[87] Append entry to feed 's entries .[25] Return feed .
RSS 2.0 and RSS 0.9x feeds [301] To process an rss element element , run these steps:[239] Let feed be a feed .[240] For each element child in element 's children , in order,
run these substeps:[302] Switch by child 's namespace and local name :channel element in the null namespace
Process an RSS 2.0 channel element with feed . item element in the null namespace [242] Let entry be the result of processing an RSS 2.0 item
child .[243] Set entry 's feed to feed .[259] Cleanup entry entry .[244] Append entry to feed 's entries .[245] Return feed .
[206] To process an RSS 2.0 channel element with feed feed ,
run these steps:[208] For each element child in element 's children , in order,
run these substeps:[209] Switch by child 's namespace and local name :image element in the null namespace [211] If feed 's logo is null :[213] Let element be child 's first url child element
in the null namespace .[214] If element is not null :[308] Let url be the result of
processing a URL element element .[37] If url is not null :[310] Let image be an image .[314] Set image 's URL to url .[309] Set feed 's logo to image .image element in the iTunes namespace
If feed 's icon is null ,
set feed 's icon to the result of
processing an image element child
with attribute name href . creator element in the Dublin Core namespace or author element in the iTunes namespace [316] Let text be element 's child text content .[317] If text is not the empty string and
there is no person whose name is text
in feed 's authors :[315] Append a person whose name is text
to feed 's authors .managingEditor element in the null namespace [357] Let person be the result of processing an RSS 2.0 person child .[360] If person is not null :[358] If person 's email is not null ,
or if there is no person whose name is person 's name
in feed 's authors :[359] Append person to feed 's authors .pubDate or lastBuildDate element in the null namespace
If feed 's updated is null ,
set feed 's updated to the result of
processing an RSS 2.0 date child . title element in the null namespace
If feed 's title is null ,
set feed 's title to the result of
processing a string element child . subtitle element in the iTunes namespace
If feed 's subtitle is null ,
set feed 's subtitle to the result of
processing a string element child . description element in the null namespace or summary element in the iTunes namespace
If feed 's description is null ,
set feed 's description to the result of
processing a string element child . link element in the null namespace
If feed 's page URL is null ,
set feed 's page URL to the result of
processing a URL element child . link element in the Atom namespace
Process an Atom link child for feed , with type feed . item element in the null namespace [147] Let entry be the result of processing an RSS 2.0 item
child .[148] Set entry 's feed to feed .[261] Cleanup entry entry .[149] Append entry to feed 's entries .
[43] The key word MUST is defined by RFC 2119 .
[34] The terms ASCII digits and URL record
are defined by the URL Standard .
[42] The terms MIME type , computed MIME type ,
type , and
parse a MIME type are defined by the MIME Sniffing Standard .
[41] The terms response , status ,
url , and network error
are defined by the Fetch Standard .
[26] The interfaces Node and
DocumentFragment are defined by the DOM Standard .
[31] The terms parent , children , inclusive descendant ,
insert , remove , clone ,
equals (of Node s),
local name , namespace , and
node document are defined by the DOM Standard .
[23] The terms
XML MIME type ,
Unicode code point , space characters , ASCII case-insensitive ,
parse a URL , resulting URL string , resulting URL string ,
rules for parsing non-negative integers ,
document's address , document's character encoding ,
palpable content , embedded content ,
child text content ,
input byte stream ,
HTML fragment parsing algorithm , HTML parser , and XML parser
are defined by the HTML Standard .
[32] The div and img
elements are defined by the HTML Standard .