inessential by Brent Simmons

Bug in some RSS feeds

I’m seeing a bug in some RSS feeds (both RSS 1.0 and RSS 2.0).

The bug is using the same item_uri or guid in multiple items in a feed. (I’m not going to point to any examples, because I don’t want to embarass anybody.)

In RSS 1.0, unless I read the spec wrong, the {item_uri} in <item rdf:about="{item_uri}"> should be unique for each item.

And in RSS 2.0, the guid should also be unique.

Something like this would be incorrect:

<item rdf:about="http://example.com/somePage.html">
   ...title, link, description stuff...
   </item>
<item rdf:about="http://example.com/somePage.html">
   ...some other title, link, and description...
   </item>

Or, in RSS 2.0, this would be incorrect:

<item>
   ...title, link, description stuff...
   <guid>http://example.com/somePage.html</guid>
   </item>
<item>
   ...some other title, link, and description...
   <guid>http://example.com/somePage.html</guid>
   </item>

Using non-unique identifiers is worse than using none at all. The reason to have identifiers is so that newsreaders can identify an item. If two items have the same identifier, a newsreader may think it’s really the same item.

It’s like in email: if you look at the headers you see a Message-Id which identifies each email. If it’s missing—or, worse, not unique—email apps could get confused. Update 3:24 p.m.: Message-Ids are not required, so a missing ID should not confuse an email app. Which is similar to RSS: identifiers are not required. It’s the case when they’re not unique that’s the problem.

P.S. If I’m wrong in my understanding of the RSS 1.0 spec, please don’t hesitate to let me know.