inessential by Brent Simmons

Non-unique unique IDs (which Joe?)

The biggest time-waster in writing aggregators is the issue of unique IDs.

One thing that separates RSS and Atom from email is that items can change. So, when an item changes, how does your aggregator know it’s a change rather than a new item? It looks at the item’s unique ID.

Unfortunately, not all feeds have unique IDs. Which is too bad. But what’s worse—what’s far worse, what makes me foam at the mouth and blow steam out my eyeballs—is when a feed has unique IDs that are not actually unique. That’s just perverse.

I don’t want to pick on an individual or small company—so I’ll pick on Yahoo. Consider this feed that does a news search on “apple.”

If you look at the source, you’ll note that each news item begins with <item rdf:about="*">. The URL is supposed to be unique in the feed.

From the RSS 1.0 spec: “{item_uri} must be unique with respect to any other rdf:about attributes in the RSS document and is a URI which identifies the item. {item_uri} should be identical to the value of the <link> sub-element of the <item> element, if possible.”

I’ll say it again: the URL must be unique. If the URL isn’t unique, you get a situation like where you have ten guys named Joe in a room, and you say, “Hey, Joe!” and they all turn around to look at you.

It would be better if they didn’t even have names, so you could say, “Hey, brown-haired guy with the wire-rim glasses wearing a polka-dot shirt!”

Yahoo isn’t the only creator of feeds that makes this mistake. This isn’t a case where fixing Yahoo’s feed fixes the problem as a whole, because it occurs other places too.

(But it would be nice if Yahoo fixed the bug, so I could include Yahoo as a search engine provider in NetNewsWire the way we include Blogdigger, Daypop, and Feedster. In Yahoo’s case, the easy thing for them to do is probably make the rdf:about URL match the link URL.)

P.S. I’m calm again now. I just have to vent about this occasionally. Breathe in, breathe out...