Fri Feb 26 2010

Core Data post follow-up notes

(This is a follow-up to the previous post.)

I like Wolf’s NSManagedObjectOperation idea.

I’m checking out Aaron Hillegass’s BNRPersistence. I ran across Tokyo Cabinet a couple months ago when building mutt, and it sounded pretty cool, though I didn’t try it any projects.

Justin asked me if it was worth doing this optimization for the extreme case — 10,000 unread items on a first-generation iPod Touch. It’s not always worth optimizing for extreme cases, not at the expense of other things. But there are several reasons why it was right this time:

  • High unread-items-count isn’t the only cause of performance degradation.

  • Lots of people still have original iPhones and iPods Touch.

  • The performance difference is significant even in a more typical case, say 1,000 unread on an iPhone 3GS.

  • The performance difference allows me to do other things that people are asking for.

Update 5:45 pm: I just looked, and the Tokyo Cabinet license is LGPL. Even with the leading L that means I can’t use it, because I don’t want to be a legal test case. (If it came pre-installed on Macs and iPhones, I would feel okay about using it.)

On switching away from Core Data

A lot of the work I’ve been doing the last several months is optimizing performance for NetNewsWire for iPhone. The changes haven’t shipped yet, because I’m not quite finished. But one part of this might be interesting to other developers, so I figured I’d write it up.

I optimized as much as I could, spent tons of time in Shark, went all multi-threaded with Core Data, switched away from my own queuing system to NSOperationQueue, optimized the XML parsing, etc. But performance and memory use on my first-generation iPod Touch (my development test device) was still not nearly good enough with a big unread count (of around 10,000 items).

At that point, having done everything else, the remaining issue was clearly Core Data. So I tried more things, re-read everything I could about Core Data performance (for the nth time), ran experiments, spent tons more time in Shark. Trying to get it good. No go.

Finally I realized I had to switch away from Core Data and use SQLite more directly. Not completely directly — I use FMDB, a lightweight Objective-C interface that works on Macs and iPhones. Gus wrote it. It’s good.

That meant a bunch more work — it’s not like Core Data and FMDB are similar or meant to be similar. So it was no drop-in replacement. Not intended to be.

But why?

I bet Core Data is the right way to go 95% of the time. Or more. It’s easy to work with. It’s fast (in most cases). It has schema upgrade tools.

The important thing to know, though, is that it’s not a database. It’s an object graph and persistence manager. (Check out the post on Cocoa with Love that goes into detail.)

But surely you’re using objects

The difference between Core Data and a database was never that clear to me — until I found concrete examples.

After all, under the hood, in the code, every news item in a feed is an object. Why wouldn’t I use an object persistence framework for that? They’re objects, and I want to persist them. Duh. Seems like I should use Core Data.

So here are some concrete examples where direct database access made more sense than using Core Data.

1. Marking lots of news items as read or unread

The app gets from the Google Reader API a big list of item IDs that have been marked read or unread.

In Core Data, I had to loop through the list, change the status for each individual item. The list could be up to 10,000 items long. Not a good idea.

This is a very database-y operation. With one query the app can set the status for a whole bunch of items at once, without having to instantiate them as objects: update newsItems set read = 1 where...

2. Deleting lots of items

Similar to #1 above — from time to time the app deletes old, read, non-starred items from storage. We can’t just let storage grow forever, especially not on an iPhone or iPod Touch.

With Core Data, I ran a query to figure out what items to delete. Then ran a loop that deleted them. Expensive.

With SQLite access, I just did a sinqle query: delete from newsItems where...

3. Dealing with unique IDs from outside system

Core Data does uniquing, but that’s not what this is. The news items have an assigned unique ID that comes from another database.

When refreshing feeds, it’s common to see news items that the app has seen before. They might have been downloaded previously or they might have changed. (We try to avoid the former, of course.)

This means that for each item in a feed, before it’s saved, the app first has to get the existing news item. This is slow. (I tried various techniques: pre-fetching, fetching as needed, fetching only IDs of existing items for a feed, storing existing IDs in a set or dictionary, etc. Nothing helped much. Usually the solution was worse than the original problem.)

Because many thousands of items may come in during a refresh session, and every item has to be checked to see if it exists already, this was a huge performance hit. Better not to do the fetch, right?

With more-direct access, I could just do a insert or replace into newsItems... and it would add the item or replace the existing item. Fast.

4. Testing for the existence of an item

Sometimes the app just needs to know if something exists in the database. With Core Data, it’s a fetch.

With SQLite, here’s one of my favorite tricks: select 1 from someTable where uniqueID = whatever.

In theory it should hit the index only, since it doesn’t actually retrieve anything from the table itself. It’s fast, at any rate.

My favorite magic

Once I had the above (and everything else) working, there was still more optimization to do.

I had created a set of indexes that I thought would do the trick — but there’s nothing like actually seeing what will happen when a query runs. With direct access, with control over the indexes, I could test and iterate until I got the right set of indexes.

The magic is SQLite’s explain query plan command. It tells you what indexes will be used.

In the end

I didn’t entirely switch away from Core Data. Feeds and folders are still Core Data objects. Since there was no performance gain to be had by switching those over, I left them as-is.

It’s just news items that got switched — but that’s almost all the data.

Making the switch did mean I had to do some things manually that Core Data would have done for me: keeping any in-memory items synced with the database storage, mostly.

But, still, in the end, the new version of the system was less code than the Core Data version. That will not be the case for most apps. I took it as further indication that this was the right move for this particular app.

Warning

This isn’t about being a hardcore low-level developer or some crap like that. I like Core Data a ton. (I recommend Marcus Zarra’s book, by the way, which I read twice.) If I could have stuck with Core Data for everything, I would have. (Rule: always work at the highest level possible.)

But how do you know when you might be better off with FMDB or other more-direct SQLite access? I think it goes like this, at least based on my experience:

  1. Is performance good? Then stick with Core Data. (That should cover 95% or more of data-driven apps right there.)

  2. Is Core Data really the cause of your performance problems? Can you optimize other things? Can you optimize your use of Core Data? Will going multi-threaded do the trick? Try. If you can get performance good, then stick with Core Data.

  3. Are your remaining performance problems really database-y things? In other words, are you doing things like setting one or a few properties across a big range of items; deleting lots of items based on a condition; or having to handle unique IDs from another database, and so you’re constantly doing fetches? In other words, can you benefit by not treating your data as objects sometimes? If switching to direct database access won’t help, then stick with Core Data.

My warning: you probably don’t need to switch away from Core Data. It’s the right answer almost every time.

(By the way, were this a Mac app only, Core Data would probably have been fine. But it runs on iPhones too, and that’s where performance optimization becomes so much more critical.)

Anyway: Core Data is the right answer, except when it’s not, and hopefully I’ve made it a little easier to figure out when it’s not the right answer.

Mon Feb 22 2010

Voices that Matter iPhone conference - Seattle, late April

Join me at Voices that Matter iPhone Developers Conference

I missed the Voices that Matter iPhone developers conference last year in Boston — but I heard great things about it, and I ended up being sad I missed it.

Then just recently I heard the next one is in Seattle, which is where I live. Not going to miss it this time, no way. :)

Let me get the money thing out of the way: use the discount code PHBLOGS when registering to save $100. Register before March 12 for early-bird pricing to save another $200. That’s $300 total, ’kay?

About the conference

It’s put on by our friends at Addison-Wesley — the idea is that the speakers are the folks who literally wrote the books on iPhone development. Folks like Aaron Hillegass, Jonathan Wolf Rentzsch, Kevin Avila, and more.

Aaron Hillegass, by the way, taught a ton of people Cocoa programming in person — and the ones he didn’t teach in person he taught by way of his Cocoa book. Me included. Aaron is responsible for just about everything and everybody.

I’ve seen him talk — and he’s so damn good.

About Seattle

Have you been to Seattle? It’s beautiful. Green. Lush. Alive. Waters fresh and salty plus two nearby mountain ranges plus a view of the tallest mountain in the lower 48 states.

The conference is on the waterfront at the Bell Harbor conference center. I’ve been to two Gnomedex conferences twice at that same location — it’s very nice, with a view of downtown, the waterfront, Elliot Bay, and the Olympic mountains.

The conference hotel is the Edgewater. The Beatles stayed there in like 1964 or something. In those days they used to give you fishing poles so you could fish from your hotel room.

About the Seattle Cocoa community

I don’t know who all of the locals are going. (I hope they all are.)

You might even wonder if there are any locals. Seattle’s gotta be a Microsoft town, right?

But here in the shadow of Mordor we’ve got a pretty hot bunch of developers. By way of proof I could just mention the magic kingdom of Omni and be done.

But there’s also the cool cats at Rogue Sheep, the amazing Flying Meat (Gus Mueller), Joe Heck (ringleader), some Cocoa-y Google folks, the madmen at Black Pixel Luminance, the unclassifiably hip Corporation Unknown, Professor Hal Mueller, the artisans at Zumobi, and plenty more.

After hours

I’m thinking pinball at Shorty’s. (Not like it’s the only place in town. But don’t us geeks love pinball? And it’s walking distance — just up the hill from the Edgewater.)

Anyway, that’s the scoop. Come to my neck of the woods for a change, wouldja?

Sun Feb 21 2010

360 iDev iPhone/iPad Conference - San Jose, April

Going to the 360 iDev iPhone conference?

I am. I’m speaking, even, on the topic of content-based apps. (On feeds, XML parsing, performance, networking, the beauty of NSOperationQueue, image caching and scaling, SQLite and Core Data, etc.)

I went last year to the San Jose conference and then to 360 iDev in Denver. Had a great time both times and totally look forward to this conference.

The iPad should actually be out by then, which is cool — I don’t know exactly what iPad content is lined up, but I have to figure lots of speakers will incorporate iPad into their presentations. And I bet lots of folks will bring iPads with them. I want to go just to see a whole bunch of people’s iPad apps and ideas.

I think it’s the first iPad conference with actual iPads. That’s kind of like Woodstock, right? I don’t want you to say you were there when you weren’t really there — you should actually be there.

Not enough? Here are some other randomly-jotted notes, then...

  • It’s near Cupertino. One night we all go over to Steve Jobs’s house for barbeque and Cuban cigars. It’s totally chill. (Wait. Okay. Maybe not. But it is near Cupertino, though.)

  • Last year I almost got in a fight at Linda’s Light Rail Lounge. Lesson learned: don’t ever joke about NASCAR with somebody who cares. He said he liked so-and-so as a great driver, I said something about Jeff Gordon (the only driver’s name I know) — and apparently the guy thought Jeff Gordon was a sissy or something. I had to talk fast. It’s possible that I bought him a drink, or maybe Joe or Dan handled it and saved me.

    What I mean is this: you and me, we should hang out.

  • My friends Brad Ellis and Dave Wiskus — great designers — are doing a co-presentation on something designer-y-ish.

    Don’t know who they are?

    Check out some apps Brad’s worked on: Apple Design Award winner Postage, Word Spin, and SnoGlobe.

    And check out apps Dave’s worked on: Coathangr and the wickedly addicting Typewar.

  • Oh, whatever, shut up, at least it’s always open.

  • Marcus Zarra will be there. And speaking. Don’t miss your chance to hear from the double-fisted wizard of Cores both Data and Animation.

  • Colin Donnell will be speaking. He has great hair and brains to match.

  • I sometimes judge a conference by who I met and keep in touch with. Last year I met Ryan Nielsen and the previously-mentioned Dan Burcaw.

  • In fact, the conference has such great mojo that last year Dave Wiskus was in town — for other reasons, only coincidentally in San Jose — but he hung out with us at night. And a year later he’s a successful indie iPhone developer and a speaker at 360 iDev. Imagine if he had actually attended the conference. He’d probably be an App Store billionaire by now. (Which is just $150 million in regular money, but still nothing to sneeze at.)

    (That App Store money thing didn’t make any sense. Sorry.)

    What I’m saying is: this conference has the power to launch entire careers!

  • Check out the entire list of speakers. Julio will be there! Joe-with-the-hat! More! It’s a great line-up.

  • TapLynx is a sponsor. That means, yes, we’re paying for some of your fun and awesome learnings. :)

  • It’ll be months before WWDC. Hard to wait, right? Especially with all this cool iPad grooviness.

Anyway... you should go to the website and check it out. Here’s the registration page.

Also, they might set up Rock Band again. You haven’t lived till you’ve seen Joe Pezzillo do No Sleep Till Brooklyn. You can hope.

Tue Feb 16 2010

Kevin

I got email asking me if I was doing some kind of public shunning of Kevin Ballard. By no means! He’s a totally great guy — smart, friendly, funny, way more interesting than I was at his age — and I’m sorry I didn’t see him at Macworld. It’s just that 1) he’s fun to tease and 2) he’s a super-good sport.

My kind of fella.

Now pretend you didn’t read all of the above. :)

Sun Feb 14 2010

Email sent to a developer on supporting 10.6 and up

The below is an email I sent to a fellow Mac developer on reasons to support 10.6 and up on his next major release. (It’s barely edited: I just changed a couple sentences that would have made identification easy.)

Here are the reasons to go with 10.6 and up:

1. Millions of people are using 10.6. Every new Mac since last September or whatever is running 10.6. Apple is selling lots of Macs. Lots of people have upgraded to 10.6.

2. People who don’t upgrade their OS are, in general, the kind of people who just don’t buy software anyway. (Particularly in the case of 10.6, given how inexpensive the upgrade price was.)

3. Every second you spend dealing with 10.5 (in terms of testing, code, whatever) is a disservice to your customers and your software. It’s very nearly irresponsible.

4. Quality is the most important aspect of your software. Quality drives sales. Dropping 10.5 support means you can spend more time on polish; it means you can use 10.6-only features that make your app better and easier to maintain. Continuing with 10.5 support means that your software is not as good as it could be.

5. Rule of thumb: don’t ever code for a shrinking OS version.

Wayne Gretzky: "I skate to where the puck is going to be, not where it has been."

There will be n 10.5 users on the day the next version of your app is released. The next day there will be n minus some number. A month later it will be n minus some big number — but you’ll still be supporting 10.5, you’ll still be writing software for people who don’t buy software anyway.

There are x 10.6 users today. Tomorrow there will be x plus some number. A month from now it will be x plus some big number.

6. Current users of your app still on 10.5 have a perfectly awesome piece of software to use.

7. If you don’t drop 10.5 now, when do you drop it? On a major release is when it’s easiest, and you don’t want to wait for the major release after this next one.

Wed Feb 10 2010

Advice to new developers on networking

This is for folks new to the Mac, iPhone, and iPad development community who are going to their first conference...

You might wonder if this “networking” thing you’ve heard about is really a thing. “I’ve got Xcode,” you think. “Do I really have to, you know, meet people and stuff? Isn’t networking something my Dad did? What about the meritocracy?”

While you’re busy asking yourself questions, other people are having a good time.

Here’s the deal: you don’t actually need to know anybody else to be successful. You totally don’t. It’s fine.

But it helps.

It’s not really networking, anyway. Or, at least, I’ve never gone into a bar or a party thinking I’ll advance my career or my software. That would be weird and yucky.

Rather, there’s a great community of developers and journalists and bloggers, and they’re roughly in your age range, and you have some interests in common, and almost everybody is nice, and — hey, it sounds like kindergarten, I know — but you can make friends.

That’s all there is to it. It’s not networking: get that dumb word out of your head.

Okay, here’s some practical advice.

Two types of geeks

The first type is exactly what you’d expect: they’re the technologists, the guys who would invent computers if they didn’t already exist. On their nth beer they can discuss the fine points of objc_msgSend_stret().

While they’re talking to you they’re also, in their heads, optimizing the queueing algorithm at the bar, writing their first quantum computing application, and stepping through the code they wrote just an hour ago.

The second type is tech-inflected liberal arts types. (I sometimes wonder if this surprises new computer science graduates.) Journalists and bloggers are often of this type — but a perhaps-surprising number of developers are too. They’d rather discuss Gogol and Gaga, Kafka and Kubrick, Borges and Black Eyed Peas.

What both types have in common, though, is Apple products. “Hey, how ’bout that iPad, huh?”

Both types also love well-designed software.

Some things not to do

Remember that everyone sits at their desk most of the time working on hard things. But not at the moment you’re talking to them. At that moment it’s time to have fun, take a little break from the hard things.

I think many of the technologists can deal with bug reports and feature requests in person. For others it’s too much like being back at the desk. (For me it is, anyway.)

I don’t know anybody who likes being cornered or monopolized, or who can stop what they’re doing to spend 30 minutes looking at a demo.

What you should do

Remember that all geeks are shy, just like you. Even the boisterous ones. Or especially. The word “shy” is so universally applicable among geeks that it means nothing: it’s no excuse for you or anybody else. (What do you think beer is for? It’s not just a FIFO stack.)

But if someone ever seems stand-off-ish or awkward — take it as shyness. That’s all it is. (Countless times I’ve heard people say “so-and-so doesn’t like me, I think” — when it’s always just that geek social skills are a little rough-edged. Mine included. In some cases these people have become best friends.)

So, yes, remember that they’re all people, just folks, not different from you in some fundamental way.

Though I will caution you not to stare directly into my third eye, or make fun of the extra head on the side of John Gruber’s regular head, or try to grab Wolf’s tail.

And if you think you actually just saw Thor himself, well, yes, but we call him bbum (the Norse god of Tequila).

(And, one more time, though it should go without saying by now — if you find yourself anywhere near Kevin Ballard, just slowly back away and move to the other side of the bar. Don’t move too fast — his eyes are freakishly sensitive to motion. You’ll be okay. Eventually. I know it burns.)

Last

This is the last Macworld Expo where you won’t see iPads. I’m nostalgic already.

Super-quick guide to Macworld Expo

The Expo and trade show is during the day. Walk the floor. Pay special attention to the smaller companies — that’s where you’ll usually find the most interesting things. I tend to avoid any exhibit with a big video screen, and that’s served me well.

In the evening is when you get more of a chance to meet people and talk. It’s easy to find out where to be:

  1. Consult the Hess Memorial party list.

  2. If you’re an indie developer, be at the Chaat Café 6 pm Thursday.

  3. Follow Mac developers and journalists on Twitter — they usually make themselves pretty easy to find. (And you’ll often find that a bunch of them are at the same place.) (They’re very needy people. It’s sad, really. At that hour what they usually need is a beer.)

  4. Trust your iPhone Map program. Search works.

Also, as always, drink plenty of water, remember to eat and sleep, and see some of San Francisco if you’re new to the city. And, for the sake of all that’s good and right in the world, do not engage Kevin Ballard in conversation.

Tue Feb 09 2010

On the benefits of thin-server RSS syncing

I’ve had a bunch of people ask me about the thin-server RSS syncing system I talked about yesterday.

The main question: what are the benefits?

First let’s define things a little. A thick-server RSS syncing system is something like Google Reader, NewsGator, Bloglines — where the server actually downloads the feeds, and client apps talk to the server rather than to the original sources.

There are lots of benefits to this kind of system. There’s every reason for this to be widely used — it’s the right choice for lots of people, probably for most.

A thin-server syncing system doesn’t read the feeds: it only knows about users, subscriptions lists, and the status of news items. No actual feed content. Loosely coupled to the actual RSS readers.

Here are some of the benefits of a hypothetical thin-server system (in no particular order):

No latency

The thick-server systems have to read millions of feeds. So they don’t usually get updates the moment they happen — they check a feed once an hour or whatever. (Maybe it’s every 15 minutes or whatever for popular feeds.)

This means that news gets to the client apps a little less quickly than it would otherwise.

With the thin-server, the clients read the feeds directly, so they get exactly what’s available at that time.

Security

Say you read a password-protected feed. A thick-server system would have to support that, and you’d have to send your credentials to that system. That system would have to store the content: it would treat it like any other feed it reads.

It’s not economical for thick-server systems to handle password-protected feeds, since each one can’t be re-used. It’s one copy per username/password pair.

With a thin-server system, you never transmit your username and password. It never sees the feed data, just the URL of the feed and IDs of news items. No problem syncing password-protected feeds.

Reachability

Say you read feeds from a local intranet that a thick-server can’t reach. You can’t sync these, since the thick server can’t read the feeds.

But, again, a thin server doesn’t care. All it sees are feed URLs and IDs of news items. No problem syncing intranet-only feeds.

(This also applies to things like script subscriptions. A thick server isn’t going to run an AppleScript, for example, but multiple clients might run the same script. The news items status would still be syncable.) (But not the script! No way would I want to sync executable code.)

Server downtime doesn’t prevent you from getting your feeds

If a thick-server system goes down, you can’t get your feeds. (Unless you turn off syncing.)

With a thin-server system, you still get your feeds. The clients wait to sync up.

Decentralized

So far, all the thick-server systems are on one big (conceptual) server. This means one point of failure for everyone who uses that system. Downtime is a big issue.

It’s conceivable that you could write a thick-server system that can run anywhere. Something open source, something easy to install. But it would use so much resources and bandwidth (reading the feeds every hour, returning entire feeds to client apps) that it would be prohibitive for many people. You couldn’t just install it on your account at your web provider and hope to get away with it. (Well, depending on lots of factors, of course. If it was just for you, and you didn’t have too many feeds, it’s probably okay.)

A thin-server system, on the other hand, would be easy to run. Minimal bandwidth, no content system where it downloads and stores feeds. It should be easier to set up and run than WordPress.

Easier to move from synced to non-synced and back

The thick-server systems rewrite the feeds, and usually substitute their own unique ID for whatever was in the feed. (Though, in the case of Google Reader, it also provides the original unique ID, if there was one.)

Because the feeds are rewritten, it can be very difficult to match up a non-synced item with its synced equivalent. This can make turning on or off syncing very rough, as you end up with duplicates.

Longer limits on news item status

This isn’t inherent, but it’s practical. Thick-server systems tend to serve a ton of people, so they have to have limits on the length of time news items status data will be stored.

For instance, NewsGator’s was two weeks or 200 items, whichever was first. (If I recall correctly.) Google Reader’s is, I believe, roughly twice that (but with some special cases, like when you do a mark-all-read in Google Reader and when you first subscribe to a feed).

But a thin system can afford to keep news item status data longer. Make it six months or a year.

No data loss

Because thick-server systems rewrite the feeds, they’ll often toss out parts of the original feed that they don’t care about.

Again, this doesn’t have to be inherent, but for practical reasons it’s often done this way.

With the thin server, you read the feeds directly, so you miss nothing.

Twitter and other feed-like things

This system would work for anything feed-like: it just needs a URL and individual item IDs.

Imagine pointing not just your RSS readers but also your Twitter clients at the server — your Twitter clients could know which items you’ve already read. Want that? I do. :)

Your data in your control

You could use someone else’s server, if they allowed it. Maybe there’d be inexpensive for-pay services.

But, at least conceptually, you could run it yourself, and control all your data yourself. The opportunity would be there, at any rate.

Anyway

That’s all I have in my head at the moment. There are more benefits, surely, but I think the above is plenty.

Archive

© 1995-2010 Brent Simmons