Feb 2010

Core Data post follow-up notes

(This is a follow-up to the previous post.)

I like Wolf’s NSManagedObjectOperation idea.

I’m checking out Aaron Hillegass’s BNRPersistence. I ran across Tokyo Cabinet a couple months ago when building mutt, and it sounded pretty cool, though I didn’t try it any projects.

Justin asked me if it was worth doing this optimization for the extreme case — 10,000 unread items on a first-generation iPod Touch. It’s not always worth optimizing for extreme cases, not at the expense of other things. But there are several reasons why it was right this time:

  • High unread-items-count isn’t the only cause of performance degradation.

  • Lots of people still have original iPhones and iPods Touch.

  • The performance difference is significant even in a more typical case, say 1,000 unread on an iPhone 3GS.

  • The performance difference allows me to do other things that people are asking for.

Update 5:45 pm: I just looked, and the Tokyo Cabinet license is LGPL. Even with the leading L that means I can’t use it, because I don’t want to be a legal test case. (If it came pre-installed on Macs and iPhones, I would feel okay about using it.)

On switching away from Core Data

A lot of the work I’ve been doing the last several months is optimizing performance for NetNewsWire for iPhone. The changes haven’t shipped yet, because I’m not quite finished. But one part of this might be interesting to other developers, so I figured I’d write it up.

I optimized as much as I could, spent tons of time in Shark, went all multi-threaded with Core Data, switched away from my own queuing system to NSOperationQueue, optimized the XML parsing, etc. But performance and memory use on my first-generation iPod Touch (my development test device) was still not nearly good enough with a big unread count (of around 10,000 items).

At that point, having done everything else, the remaining issue was clearly Core Data. So I tried more things, re-read everything I could about Core Data performance (for the nth time), ran experiments, spent tons more time in Shark. Trying to get it good. No go.

Finally I realized I had to switch away from Core Data and use SQLite more directly. Not completely directly — I use FMDB, a lightweight Objective-C interface that works on Macs and iPhones. Gus wrote it. It’s good.

That meant a bunch more work — it’s not like Core Data and FMDB are similar or meant to be similar. So it was no drop-in replacement. Not intended to be.

But why?

I bet Core Data is the right way to go 95% of the time. Or more. It’s easy to work with. It’s fast (in most cases). It has schema upgrade tools.

The important thing to know, though, is that it’s not a database. It’s an object graph and persistence manager. (Check out the post on Cocoa with Love that goes into detail.)

But surely you’re using objects

The difference between Core Data and a database was never that clear to me — until I found concrete examples.

After all, under the hood, in the code, every news item in a feed is an object. Why wouldn’t I use an object persistence framework for that? They’re objects, and I want to persist them. Duh. Seems like I should use Core Data.

So here are some concrete examples where direct database access made more sense than using Core Data.

1. Marking lots of news items as read or unread

The app gets from the Google Reader API a big list of item IDs that have been marked read or unread.

In Core Data, I had to loop through the list, change the status for each individual item. The list could be up to 10,000 items long. Not a good idea.

This is a very database-y operation. With one query the app can set the status for a whole bunch of items at once, without having to instantiate them as objects: update newsItems set read = 1 where...

2. Deleting lots of items

Similar to #1 above — from time to time the app deletes old, read, non-starred items from storage. We can’t just let storage grow forever, especially not on an iPhone or iPod Touch.

With Core Data, I ran a query to figure out what items to delete. Then ran a loop that deleted them. Expensive.

With SQLite access, I just did a sinqle query: delete from newsItems where...

3. Dealing with unique IDs from outside system

Core Data does uniquing, but that’s not what this is. The news items have an assigned unique ID that comes from another database.

When refreshing feeds, it’s common to see news items that the app has seen before. They might have been downloaded previously or they might have changed. (We try to avoid the former, of course.)

This means that for each item in a feed, before it’s saved, the app first has to get the existing news item. This is slow. (I tried various techniques: pre-fetching, fetching as needed, fetching only IDs of existing items for a feed, storing existing IDs in a set or dictionary, etc. Nothing helped much. Usually the solution was worse than the original problem.)

Because many thousands of items may come in during a refresh session, and every item has to be checked to see if it exists already, this was a huge performance hit. Better not to do the fetch, right?

With more-direct access, I could just do a insert or replace into newsItems... and it would add the item or replace the existing item. Fast.

4. Testing for the existence of an item

Sometimes the app just needs to know if something exists in the database. With Core Data, it’s a fetch.

With SQLite, here’s one of my favorite tricks: select 1 from someTable where uniqueID = whatever.

In theory it should hit the index only, since it doesn’t actually retrieve anything from the table itself. It’s fast, at any rate.

My favorite magic

Once I had the above (and everything else) working, there was still more optimization to do.

I had created a set of indexes that I thought would do the trick — but there’s nothing like actually seeing what will happen when a query runs. With direct access, with control over the indexes, I could test and iterate until I got the right set of indexes.

The magic is SQLite’s explain query plan command. It tells you what indexes will be used.

In the end

I didn’t entirely switch away from Core Data. Feeds and folders are still Core Data objects. Since there was no performance gain to be had by switching those over, I left them as-is.

It’s just news items that got switched — but that’s almost all the data.

Making the switch did mean I had to do some things manually that Core Data would have done for me: keeping any in-memory items synced with the database storage, mostly.

But, still, in the end, the new version of the system was less code than the Core Data version. That will not be the case for most apps. I took it as further indication that this was the right move for this particular app.

Warning

This isn’t about being a hardcore low-level developer or some crap like that. I like Core Data a ton. (I recommend Marcus Zarra’s book, by the way, which I read twice.) If I could have stuck with Core Data for everything, I would have. (Rule: always work at the highest level possible.)

But how do you know when you might be better off with FMDB or other more-direct SQLite access? I think it goes like this, at least based on my experience:

  1. Is performance good? Then stick with Core Data. (That should cover 95% or more of data-driven apps right there.)

  2. Is Core Data really the cause of your performance problems? Can you optimize other things? Can you optimize your use of Core Data? Will going multi-threaded do the trick? Try. If you can get performance good, then stick with Core Data.

  3. Are your remaining performance problems really database-y things? In other words, are you doing things like setting one or a few properties across a big range of items; deleting lots of items based on a condition; or having to handle unique IDs from another database, and so you’re constantly doing fetches? In other words, can you benefit by not treating your data as objects sometimes? If switching to direct database access won’t help, then stick with Core Data.

My warning: you probably don’t need to switch away from Core Data. It’s the right answer almost every time.

(By the way, were this a Mac app only, Core Data would probably have been fine. But it runs on iPhones too, and that’s where performance optimization becomes so much more critical.)

Anyway: Core Data is the right answer, except when it’s not, and hopefully I’ve made it a little easier to figure out when it’s not the right answer.

Voices that Matter iPhone conference - Seattle, late April

Join me at Voices that Matter iPhone Developers Conference

I missed the Voices that Matter iPhone developers conference last year in Boston — but I heard great things about it, and I ended up being sad I missed it.

Then just recently I heard the next one is in Seattle, which is where I live. Not going to miss it this time, no way. :)

Let me get the money thing out of the way: use the discount code PHBLOGS when registering to save $100. Register before March 12 for early-bird pricing to save another $200. That’s $300 total, ’kay?

About the conference

It’s put on by our friends at Addison-Wesley — the idea is that the speakers are the folks who literally wrote the books on iPhone development. Folks like Aaron Hillegass, Jonathan Wolf Rentzsch, Kevin Avila, and more.

Aaron Hillegass, by the way, taught a ton of people Cocoa programming in person — and the ones he didn’t teach in person he taught by way of his Cocoa book. Me included. Aaron is responsible for just about everything and everybody.

I’ve seen him talk — and he’s so damn good.

About Seattle

Have you been to Seattle? It’s beautiful. Green. Lush. Alive. Waters fresh and salty plus two nearby mountain ranges plus a view of the tallest mountain in the lower 48 states.

The conference is on the waterfront at the Bell Harbor conference center. I’ve been to two Gnomedex conferences twice at that same location — it’s very nice, with a view of downtown, the waterfront, Elliot Bay, and the Olympic mountains.

The conference hotel is the Edgewater. The Beatles stayed there in like 1964 or something. In those days they used to give you fishing poles so you could fish from your hotel room.

About the Seattle Cocoa community

I don’t know who all of the locals are going. (I hope they all are.)

You might even wonder if there are any locals. Seattle’s gotta be a Microsoft town, right?

But here in the shadow of Mordor we’ve got a pretty hot bunch of developers. By way of proof I could just mention the magic kingdom of Omni and be done.

But there’s also the cool cats at Rogue Sheep, the amazing Flying Meat (Gus Mueller), Joe Heck (ringleader), some Cocoa-y Google folks, the madmen at Black Pixel Luminance, the unclassifiably hip Corporation Unknown, Professor Hal Mueller, the artisans at Zumobi, and plenty more.

After hours

I’m thinking pinball at Shorty’s. (Not like it’s the only place in town. But don’t us geeks love pinball? And it’s walking distance — just up the hill from the Edgewater.)

Anyway, that’s the scoop. Come to my neck of the woods for a change, wouldja?

360 iDev iPhone/iPad Conference - San Jose, April

Going to the 360 iDev iPhone conference?

I am. I’m speaking, even, on the topic of content-based apps. (On feeds, XML parsing, performance, networking, the beauty of NSOperationQueue, image caching and scaling, SQLite and Core Data, etc.)

I went last year to the San Jose conference and then to 360 iDev in Denver. Had a great time both times and totally look forward to this conference.

The iPad should actually be out by then, which is cool — I don’t know exactly what iPad content is lined up, but I have to figure lots of speakers will incorporate iPad into their presentations. And I bet lots of folks will bring iPads with them. I want to go just to see a whole bunch of people’s iPad apps and ideas.

I think it’s the first iPad conference with actual iPads. That’s kind of like Woodstock, right? I don’t want you to say you were there when you weren’t really there — you should actually be there.

Not enough? Here are some other randomly-jotted notes, then...

  • It’s near Cupertino. One night we all go over to Steve Jobs’s house for barbeque and Cuban cigars. It’s totally chill. (Wait. Okay. Maybe not. But it is near Cupertino, though.)

  • Last year I almost got in a fight at Linda’s Light Rail Lounge. Lesson learned: don’t ever joke about NASCAR with somebody who cares. He said he liked so-and-so as a great driver, I said something about Jeff Gordon (the only driver’s name I know) — and apparently the guy thought Jeff Gordon was a sissy or something. I had to talk fast. It’s possible that I bought him a drink, or maybe Joe or Dan handled it and saved me.

    What I mean is this: you and me, we should hang out.

  • My friends Brad Ellis and Dave Wiskus — great designers — are doing a co-presentation on something designer-y-ish.

    Don’t know who they are?

    Check out some apps Brad’s worked on: Apple Design Award winner Postage, Word Spin, and SnoGlobe.

    And check out apps Dave’s worked on: Coathangr and the wickedly addicting Typewar.

  • Oh, whatever, shut up, at least it’s always open.

  • Marcus Zarra will be there. And speaking. Don’t miss your chance to hear from the double-fisted wizard of Cores both Data and Animation.

  • Colin Donnell will be speaking. He has great hair and brains to match.

  • I sometimes judge a conference by who I met and keep in touch with. Last year I met Ryan Nielsen and the previously-mentioned Dan Burcaw.

  • In fact, the conference has such great mojo that last year Dave Wiskus was in town — for other reasons, only coincidentally in San Jose — but he hung out with us at night. And a year later he’s a successful indie iPhone developer and a speaker at 360 iDev. Imagine if he had actually attended the conference. He’d probably be an App Store billionaire by now. (Which is just $150 million in regular money, but still nothing to sneeze at.)

    (That App Store money thing didn’t make any sense. Sorry.)

    What I’m saying is: this conference has the power to launch entire careers!

  • Check out the entire list of speakers. Julio will be there! Joe-with-the-hat! More! It’s a great line-up.

  • TapLynx is a sponsor. That means, yes, we’re paying for some of your fun and awesome learnings. :)

  • It’ll be months before WWDC. Hard to wait, right? Especially with all this cool iPad grooviness.

Anyway... you should go to the website and check it out. Here’s the registration page.

Also, they might set up Rock Band again. You haven’t lived till you’ve seen Joe Pezzillo do No Sleep Till Brooklyn. You can hope.

Kevin

I got email asking me if I was doing some kind of public shunning of Kevin Ballard. By no means! He’s a totally great guy — smart, friendly, funny, way more interesting than I was at his age — and I’m sorry I didn’t see him at Macworld. It’s just that 1) he’s fun to tease and 2) he’s a super-good sport.

My kind of fella.

Now pretend you didn’t read all of the above. :)

Email sent to a developer on supporting 10.6 and up

The below is an email I sent to a fellow Mac developer on reasons to support 10.6 and up on his next major release. (It’s barely edited: I just changed a couple sentences that would have made identification easy.)

Here are the reasons to go with 10.6 and up:

1. Millions of people are using 10.6. Every new Mac since last September or whatever is running 10.6. Apple is selling lots of Macs. Lots of people have upgraded to 10.6.

2. People who don’t upgrade their OS are, in general, the kind of people who just don’t buy software anyway. (Particularly in the case of 10.6, given how inexpensive the upgrade price was.)

3. Every second you spend dealing with 10.5 (in terms of testing, code, whatever) is a disservice to your customers and your software. It’s very nearly irresponsible.

4. Quality is the most important aspect of your software. Quality drives sales. Dropping 10.5 support means you can spend more time on polish; it means you can use 10.6-only features that make your app better and easier to maintain. Continuing with 10.5 support means that your software is not as good as it could be.

5. Rule of thumb: don’t ever code for a shrinking OS version.

Wayne Gretzky: "I skate to where the puck is going to be, not where it has been."

There will be n 10.5 users on the day the next version of your app is released. The next day there will be n minus some number. A month later it will be n minus some big number — but you’ll still be supporting 10.5, you’ll still be writing software for people who don’t buy software anyway.

There are x 10.6 users today. Tomorrow there will be x plus some number. A month from now it will be x plus some big number.

6. Current users of your app still on 10.5 have a perfectly awesome piece of software to use.

7. If you don’t drop 10.5 now, when do you drop it? On a major release is when it’s easiest, and you don’t want to wait for the major release after this next one.

Advice to new developers on networking

This is for folks new to the Mac, iPhone, and iPad development community who are going to their first conference...

You might wonder if this “networking” thing you’ve heard about is really a thing. “I’ve got Xcode,” you think. “Do I really have to, you know, meet people and stuff? Isn’t networking something my Dad did? What about the meritocracy?”

While you’re busy asking yourself questions, other people are having a good time.

Here’s the deal: you don’t actually need to know anybody else to be successful. You totally don’t. It’s fine.

But it helps.

It’s not really networking, anyway. Or, at least, I’ve never gone into a bar or a party thinking I’ll advance my career or my software. That would be weird and yucky.

Rather, there’s a great community of developers and journalists and bloggers, and they’re roughly in your age range, and you have some interests in common, and almost everybody is nice, and — hey, it sounds like kindergarten, I know — but you can make friends.

That’s all there is to it. It’s not networking: get that dumb word out of your head.

Okay, here’s some practical advice.

Two types of geeks

The first type is exactly what you’d expect: they’re the technologists, the guys who would invent computers if they didn’t already exist. On their nth beer they can discuss the fine points of objc_msgSend_stret().

While they’re talking to you they’re also, in their heads, optimizing the queueing algorithm at the bar, writing their first quantum computing application, and stepping through the code they wrote just an hour ago.

The second type is tech-inflected liberal arts types. (I sometimes wonder if this surprises new computer science graduates.) Journalists and bloggers are often of this type — but a perhaps-surprising number of developers are too. They’d rather discuss Gogol and Gaga, Kafka and Kubrick, Borges and Black Eyed Peas.

What both types have in common, though, is Apple products. “Hey, how ’bout that iPad, huh?”

Both types also love well-designed software.

Some things not to do

Remember that everyone sits at their desk most of the time working on hard things. But not at the moment you’re talking to them. At that moment it’s time to have fun, take a little break from the hard things.

I think many of the technologists can deal with bug reports and feature requests in person. For others it’s too much like being back at the desk. (For me it is, anyway.)

I don’t know anybody who likes being cornered or monopolized, or who can stop what they’re doing to spend 30 minutes looking at a demo.

What you should do

Remember that all geeks are shy, just like you. Even the boisterous ones. Or especially. The word “shy” is so universally applicable among geeks that it means nothing: it’s no excuse for you or anybody else. (What do you think beer is for? It’s not just a FIFO stack.)

But if someone ever seems stand-off-ish or awkward — take it as shyness. That’s all it is. (Countless times I’ve heard people say “so-and-so doesn’t like me, I think” — when it’s always just that geek social skills are a little rough-edged. Mine included. In some cases these people have become best friends.)

So, yes, remember that they’re all people, just folks, not different from you in some fundamental way.

Though I will caution you not to stare directly into my third eye, or make fun of the extra head on the side of John Gruber’s regular head, or try to grab Wolf’s tail.

And if you think you actually just saw Thor himself, well, yes, but we call him bbum (the Norse god of Tequila).

(And, one more time, though it should go without saying by now — if you find yourself anywhere near Kevin Ballard, just slowly back away and move to the other side of the bar. Don’t move too fast — his eyes are freakishly sensitive to motion. You’ll be okay. Eventually. I know it burns.)

Last

This is the last Macworld Expo where you won’t see iPads. I’m nostalgic already.

Super-quick guide to Macworld Expo

The Expo and trade show is during the day. Walk the floor. Pay special attention to the smaller companies — that’s where you’ll usually find the most interesting things. I tend to avoid any exhibit with a big video screen, and that’s served me well.

In the evening is when you get more of a chance to meet people and talk. It’s easy to find out where to be:

  1. Consult the Hess Memorial party list.

  2. If you’re an indie developer, be at the Chaat Café 6 pm Thursday.

  3. Follow Mac developers and journalists on Twitter — they usually make themselves pretty easy to find. (And you’ll often find that a bunch of them are at the same place.) (They’re very needy people. It’s sad, really. At that hour what they usually need is a beer.)

  4. Trust your iPhone Map program. Search works.

Also, as always, drink plenty of water, remember to eat and sleep, and see some of San Francisco if you’re new to the city. And, for the sake of all that’s good and right in the world, do not engage Kevin Ballard in conversation.

On the benefits of thin-server RSS syncing

I’ve had a bunch of people ask me about the thin-server RSS syncing system I talked about yesterday.

The main question: what are the benefits?

First let’s define things a little. A thick-server RSS syncing system is something like Google Reader, NewsGator, Bloglines — where the server actually downloads the feeds, and client apps talk to the server rather than to the original sources.

There are lots of benefits to this kind of system. There’s every reason for this to be widely used — it’s the right choice for lots of people, probably for most.

A thin-server syncing system doesn’t read the feeds: it only knows about users, subscriptions lists, and the status of news items. No actual feed content. Loosely coupled to the actual RSS readers.

Here are some of the benefits of a hypothetical thin-server system (in no particular order):

No latency

The thick-server systems have to read millions of feeds. So they don’t usually get updates the moment they happen — they check a feed once an hour or whatever. (Maybe it’s every 15 minutes or whatever for popular feeds.)

This means that news gets to the client apps a little less quickly than it would otherwise.

With the thin-server, the clients read the feeds directly, so they get exactly what’s available at that time.

Security

Say you read a password-protected feed. A thick-server system would have to support that, and you’d have to send your credentials to that system. That system would have to store the content: it would treat it like any other feed it reads.

It’s not economical for thick-server systems to handle password-protected feeds, since each one can’t be re-used. It’s one copy per username/password pair.

With a thin-server system, you never transmit your username and password. It never sees the feed data, just the URL of the feed and IDs of news items. No problem syncing password-protected feeds.

Reachability

Say you read feeds from a local intranet that a thick-server can’t reach. You can’t sync these, since the thick server can’t read the feeds.

But, again, a thin server doesn’t care. All it sees are feed URLs and IDs of news items. No problem syncing intranet-only feeds.

(This also applies to things like script subscriptions. A thick server isn’t going to run an AppleScript, for example, but multiple clients might run the same script. The news items status would still be syncable.) (But not the script! No way would I want to sync executable code.)

Server downtime doesn’t prevent you from getting your feeds

If a thick-server system goes down, you can’t get your feeds. (Unless you turn off syncing.)

With a thin-server system, you still get your feeds. The clients wait to sync up.

Decentralized

So far, all the thick-server systems are on one big (conceptual) server. This means one point of failure for everyone who uses that system. Downtime is a big issue.

It’s conceivable that you could write a thick-server system that can run anywhere. Something open source, something easy to install. But it would use so much resources and bandwidth (reading the feeds every hour, returning entire feeds to client apps) that it would be prohibitive for many people. You couldn’t just install it on your account at your web provider and hope to get away with it. (Well, depending on lots of factors, of course. If it was just for you, and you didn’t have too many feeds, it’s probably okay.)

A thin-server system, on the other hand, would be easy to run. Minimal bandwidth, no content system where it downloads and stores feeds. It should be easier to set up and run than WordPress.

Easier to move from synced to non-synced and back

The thick-server systems rewrite the feeds, and usually substitute their own unique ID for whatever was in the feed. (Though, in the case of Google Reader, it also provides the original unique ID, if there was one.)

Because the feeds are rewritten, it can be very difficult to match up a non-synced item with its synced equivalent. This can make turning on or off syncing very rough, as you end up with duplicates.

Longer limits on news item status

This isn’t inherent, but it’s practical. Thick-server systems tend to serve a ton of people, so they have to have limits on the length of time news items status data will be stored.

For instance, NewsGator’s was two weeks or 200 items, whichever was first. (If I recall correctly.) Google Reader’s is, I believe, roughly twice that (but with some special cases, like when you do a mark-all-read in Google Reader and when you first subscribe to a feed).

But a thin system can afford to keep news item status data longer. Make it six months or a year.

No data loss

Because thick-server systems rewrite the feeds, they’ll often toss out parts of the original feed that they don’t care about.

Again, this doesn’t have to be inherent, but for practical reasons it’s often done this way.

With the thin server, you read the feeds directly, so you miss nothing.

Twitter and other feed-like things

This system would work for anything feed-like: it just needs a URL and individual item IDs.

Imagine pointing not just your RSS readers but also your Twitter clients at the server — your Twitter clients could know which items you’ve already read. Want that? I do. :)

Your data in your control

You could use someone else’s server, if they allowed it. Maybe there’d be inexpensive for-pay services.

But, at least conceptually, you could run it yourself, and control all your data yourself. The opportunity would be there, at any rate.

Anyway

That’s all I have in my head at the moment. There are more benefits, surely, but I think the above is plenty.

Jens on RSS syncing

Here’s Jens Alfke on RSS syncing with a couple interesting suggestions. (Jens has a lot of experience in this area.)

Idea for alternative RSS syncing system

Google Reader is an RSS reader that can be used for RSS syncing. Bloglines used to have a very basic syncing API (maybe it still does). NewsGator had a syncing API. FeedSync is a way to use feeds to sync other stuff (as far as I can tell). Sync Services (MobileMe syncing API) is a generalized syncing system that might be able to do RSS, but works just between Macs.

WebDAV is cool. DropBox is super-amazingly-cool. But these are storage systems, not syncing systems for things like RSS.

Not one of the above is a really great system for just syncing RSS between apps.

(For people who remember NewsGator’s system with fondness: it had drawbacks too, including that it was limited to the past two weeks or 200 items for each feed.)

Now, that said, I think Google Reader is cool (both as app and API), and I’m very glad we can use it, and I totally appreciate the help we’ve had from Google. But I do also hear from NetNewsWire users who’d like an alternative.

With everything else I have to do, I don’t have time to create an alternative. But I could make NetNewsWire work with an alternative, if one existed and was worthwhile.

What is a worthwhile alternative?

There are a few criteria to meet, in no particular order:

  1. It would have to be more like email servers — that is, not just one big server or cluster of servers somewhere, but the kind of thing people can run on any server. My web service provider might run one for me, the same way it runs an email server. (Or I might run one on my LAN. Or I might install it myself on my own website.)

  2. It would have to be free and open source, so that it could be everywhere, so that it would be developed by people who just want to make RSS syncing work. (I’m a capitalist, totally, but there are times when free and open source makes sense.)

  3. It would have to work over http and https. REST API.

  4. It would have to not require that clients download the feeds themselves from the server. (This is the way Google Reader, NewsGator, Bloglines and others worked. Status info was added to the rewritten feeds. I’m saying this system should not work that way: clients would download feeds directly from their sources, just as they do when not syncing.)

  5. The API should be as simple as possible and still get the job done. The job would be defined as syncing subscriptions lists and status of individual news items in RSS and Atom feeds. (And nothing else!)

  6. It would have to be easy to configure an RSS reader to use a given server. No more than URL of server, username, and password should be required. (Less, if possible.)

  7. It should not be limited to the last 14 or 30 days (like some systems) — it should have a much larger limit, like a year.

  8. The server itself wouldn’t ever read any RSS feeds. It wouldn’t have to — it’s entirely just about syncing data between apps. It would only ever talk to client apps.

  9. It should use as little bandwidth as possible, and be as fast as possible.

  10. Authentication would use standard HTTP authentication. (Not cookies or anything else.)

  11. There should probably a PHP + MySQL version, just so it can be deployed as widely as possible. (Though I know you’re thinking Rails.)

  12. Despite its being open source, if someone did want to offer it as a for-pay service, they should be allowed to.

Notes about the API

There are some obvious things. Get subscriptions list as OPML-with-folders. (Feeds could live in multiple folders, which means folders are just like tags, so call them tags if you want to.)

API calls would support conditional GET, so getting a subscription list would usually result in a 304.

You’d probably add, delete, edit subscriptions by addressing into the tree. (That way you could delete one instance of a feed that appears multiple times. You could add/remove folders that way too.)

The other half is the status of news items. Most have a unique ID (always in the case of Atom) or a guid (usually, in the case of RSS). For items that don’t, an agreed-upon way of constructing a unique ID would have to be developed. (Pick things that don’t usually change but are enough to identify an item: pubDate in a specific format + link + feed URL, as a UTF-8 string, then MD5-hashed. Maybe. Something like that, something that would be largely reliable.)

You’d sync status incrementally: get all the status changes since a certain date (the last time you made the call). Status would probably be read, unread, deleted, starred, and saved. To set status, it would be great to address each item in individual calls, very RESTfully — but that would be a giant bandwidth waste. Better a single call that takes a structure of some kind (XML, JSON, whatever) with item IDs and status/value pairs, where you can update a bunch of items all at once.

As you can see, the server doesn’t have to do that much. It stores some small bits of data with timestamps. I don’t think it needs any cron jobs (at least not conceptually) — it just responds to requests. It doesn’t even have any idea what this data is about.

You probably have the database schema mapped out in your head already plus more specific ideas about the API.

(I don’t recall if I’ve said before, but a couple years ago we found that the average NetNewsWire user had 26 subscriptions. That should give you an idea of the storage requirements this would need. Obviously some people have hundreds or thousands, of course, but not most.)

Most of the time clients are just getting the subs list (usually getting a 304 back), and getting/setting news item status changes.

In other words, none of this sounds that hard. And it doesn’t sound like a taxing job for a server.

Invitation

If I had time, I would have written this years ago, offered it for free, made it open source, had NetNewsWire support it, and I’d have tried to get other RSS readers to support it too.

But I didn’t and don’t have time to write it.

However, if there are people who are interested in writing this, I can help. I have client apps, and I’ve been thinking about this for years, and I’ve written to several sync APIs.

If you’re seriously interested in writing some software that could end up deployed far and wide, and that would solve a real problem for real people, get in touch with me.

P.S. Here’s the business case

So you might want to make some money. That’s cool. Two business ideas:

  1. Charge people money to use the service.

  2. Collect information about popular feeds and popular news items. You could provide a real-time view into what people (in the aggregate) are reading. This might be interesting to sell, or it might be interesting as a website itself (where you could display ads). Given all the metadata in feeds, plus your user’s folders/tags, you might even be able to figure out categorization. You might even be able to provide trends, too. Certain topics are gaining/losing ground. Certain feeds are getting more or less popular. Etc. Don’t forget the pretty graphs! All of that stuff would be an add-on, of course, something you’d be able to build because you have the sync system underneath.

Compelling? I don’t know. Just what I thought of off the top of my head.

Archive