inessential by Brent Simmons

But I can tell that it’s about the same thing

One of the common feature requests we get for NetNewsWire is to handle this situation:

Say you’re subscribed to several Mac news feeds. Then one day Microsoft updates Office for Macintosh, and each feed includes a news item about it, so you have several news items about it.

You only need to read that particular piece of news once. So why not make it so NetNewsWire detects that these are all about the same thing, and mark them as read automatically once you read the first one?

Here’s why:

Consider these two news items, ripped from today’s headlines...


Title: Microsoft Office 2004 update released
Description: Microsoft's Macintosh Business Unit (Mac BU) has posted Office 2004 for Mac Update 11.1.1, which includes improvements to Excel add-in calculation, increased PowerPoint and Word 2004 stability, additional support for device drivers and enhanced appearance of imported graphics...


Title: Microsoft Office 2004 for Mac updated to 11.1.1
Description: Microsoft Corp.'s Mac Business Unit (Mac BU) on Monday announced the release of Microsoft Office 2004 for Mac Update 11.1.1. The update includes improvements to Excel add-in calculation, improves stability for PowerPoint and Word, adds support for new device drivers and improves the appearance of imported graphics.

They’re about the same thing

Both items are obviously about the same thing. You can tell by looking, instantly, no thought required.

But computers aren’t that smart. How does a piece of software know that these two are about the same thing?

The titles, descriptions, and links are different.

There are many of the same words—but you really don’t want your aggregator to start making guesses here. Imagine two completely different stories, but each one has “Apple iTunes” in the title. “Songs on Apple iTunes Music Store now free” and “Apple iTunes sold to SixApart” are not the same piece of news.

What’s the solution?

Artificial intelligence would be helpful here. But we don’t have that.

One possibility would be a new kind of link element—an external link element that is meant to identify the source of the story. For instance, if you go to the full version of the above example news items, both of the stories include a link to the same page on the Microsoft site, a page about this update to Office.

Were that link to be included in the feed, with that item, as a special link-to-the-source link, then an aggregator could know that the news items were really about the same thing.

One nice thing about this is that it’s likely that the folks at MacMinute and MacCentral would pick the same link. They wouldn’t have to coordinate, it would just work. (At least in this example. It wouldn’t always be so clear-cut.)

The bad thing about this idea is the potential for abuse—or just plain laziness. What if people make the link-to-the-source link just a link to for any story about Apple—you’d end up with stories that are not about the same thing being marked as read. Nuts.

Another problem is that you still might miss something interesting. Say MacX posted a basic news report, but MacY posted a lengthy piece with interviews and and all kinds of goodies. You wouldn’t want to miss MacY’s report—but you would, since it was marked as read when you read MacX’s news item.

In other words, I don’t know what the solution is, but it’s worth thinking about.