Fri Feb 26 2010Core Data post follow-up notes(This is a follow-up to the previous post.) I like Wolf’s NSManagedObjectOperation idea. I’m checking out Aaron Hillegass’s BNRPersistence. I ran across Tokyo Cabinet a couple months ago when building mutt, and it sounded pretty cool, though I didn’t try it any projects. Justin asked me if it was worth doing this optimization for the extreme case — 10,000 unread items on a first-generation iPod Touch. It’s not always worth optimizing for extreme cases, not at the expense of other things. But there are several reasons why it was right this time:
Update 5:45 pm: I just looked, and the Tokyo Cabinet license is LGPL. Even with the leading L that means I can’t use it, because I don’t want to be a legal test case. (If it came pre-installed on Macs and iPhones, I would feel okay about using it.) On switching away from Core DataA lot of the work I’ve been doing the last several months is optimizing performance for NetNewsWire for iPhone. The changes haven’t shipped yet, because I’m not quite finished. But one part of this might be interesting to other developers, so I figured I’d write it up. I optimized as much as I could, spent tons of time in Shark, went all multi-threaded with Core Data, switched away from my own queuing system to NSOperationQueue, optimized the XML parsing, etc. But performance and memory use on my first-generation iPod Touch (my development test device) was still not nearly good enough with a big unread count (of around 10,000 items). At that point, having done everything else, the remaining issue was clearly Core Data. So I tried more things, re-read everything I could about Core Data performance (for the nth time), ran experiments, spent tons more time in Shark. Trying to get it good. No go. Finally I realized I had to switch away from Core Data and use SQLite more directly. Not completely directly — I use FMDB, a lightweight Objective-C interface that works on Macs and iPhones. Gus wrote it. It’s good. That meant a bunch more work — it’s not like Core Data and FMDB are similar or meant to be similar. So it was no drop-in replacement. Not intended to be. But why?I bet Core Data is the right way to go 95% of the time. Or more. It’s easy to work with. It’s fast (in most cases). It has schema upgrade tools. The important thing to know, though, is that it’s not a database. It’s an object graph and persistence manager. (Check out the post on Cocoa with Love that goes into detail.) But surely you’re using objectsThe difference between Core Data and a database was never that clear to me — until I found concrete examples. After all, under the hood, in the code, every news item in a feed is an object. Why wouldn’t I use an object persistence framework for that? They’re objects, and I want to persist them. Duh. Seems like I should use Core Data. So here are some concrete examples where direct database access made more sense than using Core Data. 1. Marking lots of news items as read or unreadThe app gets from the Google Reader API a big list of item IDs that have been marked read or unread. In Core Data, I had to loop through the list, change the status for each individual item. The list could be up to 10,000 items long. Not a good idea. This is a very database-y operation. With one query the app can set the status for a whole bunch of items at once, without having to instantiate them as objects: 2. Deleting lots of itemsSimilar to #1 above — from time to time the app deletes old, read, non-starred items from storage. We can’t just let storage grow forever, especially not on an iPhone or iPod Touch. With Core Data, I ran a query to figure out what items to delete. Then ran a loop that deleted them. Expensive. With SQLite access, I just did a sinqle query: 3. Dealing with unique IDs from outside systemCore Data does uniquing, but that’s not what this is. The news items have an assigned unique ID that comes from another database. When refreshing feeds, it’s common to see news items that the app has seen before. They might have been downloaded previously or they might have changed. (We try to avoid the former, of course.) This means that for each item in a feed, before it’s saved, the app first has to get the existing news item. This is slow. (I tried various techniques: pre-fetching, fetching as needed, fetching only IDs of existing items for a feed, storing existing IDs in a set or dictionary, etc. Nothing helped much. Usually the solution was worse than the original problem.) Because many thousands of items may come in during a refresh session, and every item has to be checked to see if it exists already, this was a huge performance hit. Better not to do the fetch, right? With more-direct access, I could just do a 4. Testing for the existence of an itemSometimes the app just needs to know if something exists in the database. With Core Data, it’s a fetch. With SQLite, here’s one of my favorite tricks: In theory it should hit the index only, since it doesn’t actually retrieve anything from the table itself. It’s fast, at any rate. My favorite magicOnce I had the above (and everything else) working, there was still more optimization to do. I had created a set of indexes that I thought would do the trick — but there’s nothing like actually seeing what will happen when a query runs. With direct access, with control over the indexes, I could test and iterate until I got the right set of indexes. The magic is SQLite’s explain query plan command. It tells you what indexes will be used. In the endI didn’t entirely switch away from Core Data. Feeds and folders are still Core Data objects. Since there was no performance gain to be had by switching those over, I left them as-is. It’s just news items that got switched — but that’s almost all the data. Making the switch did mean I had to do some things manually that Core Data would have done for me: keeping any in-memory items synced with the database storage, mostly. But, still, in the end, the new version of the system was less code than the Core Data version. That will not be the case for most apps. I took it as further indication that this was the right move for this particular app. WarningThis isn’t about being a hardcore low-level developer or some crap like that. I like Core Data a ton. (I recommend Marcus Zarra’s book, by the way, which I read twice.) If I could have stuck with Core Data for everything, I would have. (Rule: always work at the highest level possible.) But how do you know when you might be better off with FMDB or other more-direct SQLite access? I think it goes like this, at least based on my experience:
My warning: you probably don’t need to switch away from Core Data. It’s the right answer almost every time. (By the way, were this a Mac app only, Core Data would probably have been fine. But it runs on iPhones too, and that’s where performance optimization becomes so much more critical.) Anyway: Core Data is the right answer, except when it’s not, and hopefully I’ve made it a little easier to figure out when it’s not the right answer. Mon Feb 22 2010Voices that Matter iPhone conference - Seattle, late AprilI missed the Voices that Matter iPhone developers conference last year in Boston — but I heard great things about it, and I ended up being sad I missed it. Then just recently I heard the next one is in Seattle, which is where I live. Not going to miss it this time, no way. :) Let me get the money thing out of the way: use the discount code PHBLOGS when registering to save $100. Register before March 12 for early-bird pricing to save another $200. That’s $300 total, ’kay? About the conferenceIt’s put on by our friends at Addison-Wesley — the idea is that the speakers are the folks who literally wrote the books on iPhone development. Folks like Aaron Hillegass, Jonathan Wolf Rentzsch, Kevin Avila, and more. Aaron Hillegass, by the way, taught a ton of people Cocoa programming in person — and the ones he didn’t teach in person he taught by way of his Cocoa book. Me included. Aaron is responsible for just about everything and everybody. I’ve seen him talk — and he’s so damn good. About SeattleHave you been to Seattle? It’s beautiful. Green. Lush. Alive. Waters fresh and salty plus two nearby mountain ranges plus a view of the tallest mountain in the lower 48 states. The conference is on the waterfront at the Bell Harbor conference center. I’ve been to two Gnomedex conferences twice at that same location — it’s very nice, with a view of downtown, the waterfront, Elliot Bay, and the Olympic mountains. The conference hotel is the Edgewater. The Beatles stayed there in like 1964 or something. In those days they used to give you fishing poles so you could fish from your hotel room. About the Seattle Cocoa communityI don’t know who all of the locals are going. (I hope they all are.) You might even wonder if there are any locals. Seattle’s gotta be a Microsoft town, right? But here in the shadow of Mordor we’ve got a pretty hot bunch of developers. By way of proof I could just mention the magic kingdom of Omni and be done. But there’s also the cool cats at Rogue Sheep, the amazing Flying Meat (Gus Mueller), Joe Heck (ringleader), some Cocoa-y Google folks, the madmen at Black Pixel Luminance, the unclassifiably hip Corporation Unknown, Professor Hal Mueller, the artisans at Zumobi, and plenty more. After hoursI’m thinking pinball at Shorty’s. (Not like it’s the only place in town. But don’t us geeks love pinball? And it’s walking distance — just up the hill from the Edgewater.) Anyway, that’s the scoop. Come to my neck of the woods for a change, wouldja? Sun Feb 21 2010360 iDev iPhone/iPad Conference - San Jose, AprilGoing to the 360 iDev iPhone conference? I am. I’m speaking, even, on the topic of content-based apps. (On feeds, XML parsing, performance, networking, the beauty of NSOperationQueue, image caching and scaling, SQLite and Core Data, etc.) I went last year to the San Jose conference and then to 360 iDev in Denver. Had a great time both times and totally look forward to this conference. The iPad should actually be out by then, which is cool — I don’t know exactly what iPad content is lined up, but I have to figure lots of speakers will incorporate iPad into their presentations. And I bet lots of folks will bring iPads with them. I want to go just to see a whole bunch of people’s iPad apps and ideas. I think it’s the first iPad conference with actual iPads. That’s kind of like Woodstock, right? I don’t want you to say you were there when you weren’t really there — you should actually be there. Not enough? Here are some other randomly-jotted notes, then...
Anyway... you should go to the website and check it out. Here’s the registration page. Also, they might set up Rock Band again. You haven’t lived till you’ve seen Joe Pezzillo do No Sleep Till Brooklyn. You can hope. Tue Feb 16 2010KevinI got email asking me if I was doing some kind of public shunning of Kevin Ballard. By no means! He’s a totally great guy — smart, friendly, funny, way more interesting than I was at his age — and I’m sorry I didn’t see him at Macworld. It’s just that 1) he’s fun to tease and 2) he’s a super-good sport. My kind of fella. Now pretend you didn’t read all of the above. :) Sun Feb 14 2010Email sent to a developer on supporting 10.6 and upThe below is an email I sent to a fellow Mac developer on reasons to support 10.6 and up on his next major release. (It’s barely edited: I just changed a couple sentences that would have made identification easy.) Here are the reasons to go with 10.6 and up: 1. Millions of people are using 10.6. Every new Mac since last September or whatever is running 10.6. Apple is selling lots of Macs. Lots of people have upgraded to 10.6. 2. People who don’t upgrade their OS are, in general, the kind of people who just don’t buy software anyway. (Particularly in the case of 10.6, given how inexpensive the upgrade price was.) 3. Every second you spend dealing with 10.5 (in terms of testing, code, whatever) is a disservice to your customers and your software. It’s very nearly irresponsible. 4. Quality is the most important aspect of your software. Quality drives sales. Dropping 10.5 support means you can spend more time on polish; it means you can use 10.6-only features that make your app better and easier to maintain. Continuing with 10.5 support means that your software is not as good as it could be. 5. Rule of thumb: don’t ever code for a shrinking OS version. Wayne Gretzky: "I skate to where the puck is going to be, not where it has been." There will be n 10.5 users on the day the next version of your app is released. The next day there will be n minus some number. A month later it will be n minus some big number — but you’ll still be supporting 10.5, you’ll still be writing software for people who don’t buy software anyway. There are x 10.6 users today. Tomorrow there will be x plus some number. A month from now it will be x plus some big number. 6. Current users of your app still on 10.5 have a perfectly awesome piece of software to use. 7. If you don’t drop 10.5 now, when do you drop it? On a major release is when it’s easiest, and you don’t want to wait for the major release after this next one. Wed Feb 10 2010Advice to new developers on networkingThis is for folks new to the Mac, iPhone, and iPad development community who are going to their first conference... You might wonder if this “networking” thing you’ve heard about is really a thing. “I’ve got Xcode,” you think. “Do I really have to, you know, meet people and stuff? Isn’t networking something my Dad did? What about the meritocracy?” While you’re busy asking yourself questions, other people are having a good time. Here’s the deal: you don’t actually need to know anybody else to be successful. You totally don’t. It’s fine. But it helps. It’s not really networking, anyway. Or, at least, I’ve never gone into a bar or a party thinking I’ll advance my career or my software. That would be weird and yucky. Rather, there’s a great community of developers and journalists and bloggers, and they’re roughly in your age range, and you have some interests in common, and almost everybody is nice, and — hey, it sounds like kindergarten, I know — but you can make friends. That’s all there is to it. It’s not networking: get that dumb word out of your head. Okay, here’s some practical advice. Two types of geeksThe first type is exactly what you’d expect: they’re the technologists, the guys who would invent computers if they didn’t already exist. On their nth beer they can discuss the fine points of objc_msgSend_stret(). While they’re talking to you they’re also, in their heads, optimizing the queueing algorithm at the bar, writing their first quantum computing application, and stepping through the code they wrote just an hour ago. The second type is tech-inflected liberal arts types. (I sometimes wonder if this surprises new computer science graduates.) Journalists and bloggers are often of this type — but a perhaps-surprising number of developers are too. They’d rather discuss Gogol and Gaga, Kafka and Kubrick, Borges and Black Eyed Peas. What both types have in common, though, is Apple products. “Hey, how ’bout that iPad, huh?” Both types also love well-designed software. Some things not to doRemember that everyone sits at their desk most of the time working on hard things. But not at the moment you’re talking to them. At that moment it’s time to have fun, take a little break from the hard things. I think many of the technologists can deal with bug reports and feature requests in person. For others it’s too much like being back at the desk. (For me it is, anyway.) I don’t know anybody who likes being cornered or monopolized, or who can stop what they’re doing to spend 30 minutes looking at a demo. What you should doRemember that all geeks are shy, just like you. Even the boisterous ones. Or especially. The word “shy” is so universally applicable among geeks that it means nothing: it’s no excuse for you or anybody else. (What do you think beer is for? It’s not just a FIFO stack.) But if someone ever seems stand-off-ish or awkward — take it as shyness. That’s all it is. (Countless times I’ve heard people say “so-and-so doesn’t like me, I think” — when it’s always just that geek social skills are a little rough-edged. Mine included. In some cases these people have become best friends.) So, yes, remember that they’re all people, just folks, not different from you in some fundamental way. Though I will caution you not to stare directly into my third eye, or make fun of the extra head on the side of John Gruber’s regular head, or try to grab Wolf’s tail. And if you think you actually just saw Thor himself, well, yes, but we call him bbum (the Norse god of Tequila). (And, one more time, though it should go without saying by now — if you find yourself anywhere near Kevin Ballard, just slowly back away and move to the other side of the bar. Don’t move too fast — his eyes are freakishly sensitive to motion. You’ll be okay. Eventually. I know it burns.) LastThis is the last Macworld Expo where you won’t see iPads. I’m nostalgic already. Super-quick guide to Macworld ExpoThe Expo and trade show is during the day. Walk the floor. Pay special attention to the smaller companies — that’s where you’ll usually find the most interesting things. I tend to avoid any exhibit with a big video screen, and that’s served me well. In the evening is when you get more of a chance to meet people and talk. It’s easy to find out where to be:
Also, as always, drink plenty of water, remember to eat and sleep, and see some of San Francisco if you’re new to the city. And, for the sake of all that’s good and right in the world, do not engage Kevin Ballard in conversation. Tue Feb 09 2010On the benefits of thin-server RSS syncingI’ve had a bunch of people ask me about the thin-server RSS syncing system I talked about yesterday. The main question: what are the benefits? First let’s define things a little. A thick-server RSS syncing system is something like Google Reader, NewsGator, Bloglines — where the server actually downloads the feeds, and client apps talk to the server rather than to the original sources. There are lots of benefits to this kind of system. There’s every reason for this to be widely used — it’s the right choice for lots of people, probably for most. A thin-server syncing system doesn’t read the feeds: it only knows about users, subscriptions lists, and the status of news items. No actual feed content. Loosely coupled to the actual RSS readers. Here are some of the benefits of a hypothetical thin-server system (in no particular order): No latencyThe thick-server systems have to read millions of feeds. So they don’t usually get updates the moment they happen — they check a feed once an hour or whatever. (Maybe it’s every 15 minutes or whatever for popular feeds.) This means that news gets to the client apps a little less quickly than it would otherwise. With the thin-server, the clients read the feeds directly, so they get exactly what’s available at that time. SecuritySay you read a password-protected feed. A thick-server system would have to support that, and you’d have to send your credentials to that system. That system would have to store the content: it would treat it like any other feed it reads. It’s not economical for thick-server systems to handle password-protected feeds, since each one can’t be re-used. It’s one copy per username/password pair. With a thin-server system, you never transmit your username and password. It never sees the feed data, just the URL of the feed and IDs of news items. No problem syncing password-protected feeds. ReachabilitySay you read feeds from a local intranet that a thick-server can’t reach. You can’t sync these, since the thick server can’t read the feeds. But, again, a thin server doesn’t care. All it sees are feed URLs and IDs of news items. No problem syncing intranet-only feeds. (This also applies to things like script subscriptions. A thick server isn’t going to run an AppleScript, for example, but multiple clients might run the same script. The news items status would still be syncable.) (But not the script! No way would I want to sync executable code.) Server downtime doesn’t prevent you from getting your feedsIf a thick-server system goes down, you can’t get your feeds. (Unless you turn off syncing.) With a thin-server system, you still get your feeds. The clients wait to sync up. DecentralizedSo far, all the thick-server systems are on one big (conceptual) server. This means one point of failure for everyone who uses that system. Downtime is a big issue. It’s conceivable that you could write a thick-server system that can run anywhere. Something open source, something easy to install. But it would use so much resources and bandwidth (reading the feeds every hour, returning entire feeds to client apps) that it would be prohibitive for many people. You couldn’t just install it on your account at your web provider and hope to get away with it. (Well, depending on lots of factors, of course. If it was just for you, and you didn’t have too many feeds, it’s probably okay.) A thin-server system, on the other hand, would be easy to run. Minimal bandwidth, no content system where it downloads and stores feeds. It should be easier to set up and run than WordPress. Easier to move from synced to non-synced and backThe thick-server systems rewrite the feeds, and usually substitute their own unique ID for whatever was in the feed. (Though, in the case of Google Reader, it also provides the original unique ID, if there was one.) Because the feeds are rewritten, it can be very difficult to match up a non-synced item with its synced equivalent. This can make turning on or off syncing very rough, as you end up with duplicates. Longer limits on news item statusThis isn’t inherent, but it’s practical. Thick-server systems tend to serve a ton of people, so they have to have limits on the length of time news items status data will be stored. For instance, NewsGator’s was two weeks or 200 items, whichever was first. (If I recall correctly.) Google Reader’s is, I believe, roughly twice that (but with some special cases, like when you do a mark-all-read in Google Reader and when you first subscribe to a feed). But a thin system can afford to keep news item status data longer. Make it six months or a year. No data lossBecause thick-server systems rewrite the feeds, they’ll often toss out parts of the original feed that they don’t care about. Again, this doesn’t have to be inherent, but for practical reasons it’s often done this way. With the thin server, you read the feeds directly, so you miss nothing. Twitter and other feed-like thingsThis system would work for anything feed-like: it just needs a URL and individual item IDs. Imagine pointing not just your RSS readers but also your Twitter clients at the server — your Twitter clients could know which items you’ve already read. Want that? I do. :) Your data in your controlYou could use someone else’s server, if they allowed it. Maybe there’d be inexpensive for-pay services. But, at least conceptually, you could run it yourself, and control all your data yourself. The opportunity would be there, at any rate. AnywayThat’s all I have in my head at the moment. There are more benefits, surely, but I think the above is plenty. © 1995-2010 Brent Simmons
|
