inessential by Brent Simmons

RSS, updated items, and links

One of the big differences between RSS and email is that RSS items can change.

So the question for any newsreader developer is this: what changes should trigger an item being marked as unread?

Specifically, today I’m thinking about a subset of that question: changes to <link> elements.

Take this hypothetical case:

1. An item looks like this:

<title>A title</title>
<description>A description</description>
<link>http://example.org/1</link>
<guid>123</guid>

2. Then the feed updates, and the <link> element changes:

<title>A title</title>
<description>A description</description>
<link>http://example.org/2</link>
<guid>123</guid>

The question: should it be marked as unread or not?

Right now, NetNewsWire marks it as unread. My policy has always been to err on the side of marking things unread—on the assumption that, otherwise, you might miss something important.

(Actually, NetNewsWire marks it as updated, and you have a pref that says whether or not to mark updated items as unread. Which most people leave turned on.)

The RSS 2.0 spec doesn’t address the issue of when to consider an item significantly changed—and I’m not sure the spec should say anything about that.

Of course, I don’t want new prefs or special cases, I want the best policy.

Perhaps changes to <link>s should not cause an item to be marked unread. What do you think?

On a related topic...

People have often asked me why we can’t use the <link> element as a unique identifier and permalink for news items.

It can’t be a unique identifier because it may change. The New York Times feeds, for instance, change their <link>s frequently: they include a query string that allows you to get past the registration system, and that query string changes. (It’s the New York Times feeds that prompted the discussion above about ignoring <link> changes.)

<link>s can’t be permalinks because there is nothing that says that a <link> has to be a permalink—it could point to another site entirely.

Which is why I always ask people to use <guid>s in their feeds, so we can identify items, so we can know when an item is an updated version of a previous item.

(About Atom there is one thing I adore: that guids—called <id>s there—are mandatory. I wish they could be mandatory in RSS too.)