How to deal with broken feeds

Tim Bray writes about how to deal with broken PEAW feeds: “I would absolutely require basic XML well-formedness.”

Me, I would absolutely love it if I, as an aggregator developer, could require well-formedness.

In other words, if a feed isn’t well-formed, then NetNewsWire would not parse it and display it.

The thing is, that doesn’t work now for RSS—but not because of anything special to RSS, it’s because feed generators don’t always produce well-formed XML. There’s no reason to expect PEAW feed generators would be any different. (Both RSS and PEAW require well-formedness. No difference there.)

The single most common cause of non-well-formedness that I see is unencoded ampersands. They appear in a feed as & rather than as &amp;. This is most often in <title>s.

In my experience this most often afflicts larger publications, not weblogs using Movable Type or Radio or whatever. My guess is it’s because these larger publications have their own in-house systems. Those systems don’t get tested the way weblog systems get tested. A weblog system will have many thousands of users, but an in-house system has just one user (the publication). (I mean user in the sense of publisher.)

Another thing about these larger publications is that their feeds are often very popular. So when one is non-well-formed, I get a ton of bug reports about it until they fix it.

Actually, that used to happen, but NetNewsWire has gotten progressively better about dealing with the ampersand problem, so I don’t get so many bug reports.

According to Tim aggregators should consider it a fatal error and not process these feeds. If I agreed, NetNewsWire users would pay the price.

Tim writes: “Granted that the RSS legacy necessarily required the use of liberal parsers, but hey, that was then, we have better tools now.”

Setting aside the note about “the RSS legacy,” which isn’t relevant to the issue of well-formedness, I want to note that though we indeed “have better tools now” they aren’t evenly distributed.

And, ironically and interestingly, John Q. NewToBlogging has better feed-generating tools than many of the large publications.

19 Aug 2003

Archive