Jaanus, Chaining execution:
Sometimes I’ve seen these animations chained by timing: the engineer calculates how long each animation takes, and delays the next one by that amount. Not only is this brittle and looks ugly in code, but it may introduce crash bugs, as the context may change during execution: the objects may go away without knowing it themselves, and happily try to run animation when it’s no longer valid to do so.
The solution to both of these things is to use block-based APIs.
Over the years I’ve come to like subclassing less and less.
You can’t get away from subclassing NSObject, of course, and if you’re writing a view controller it has to subclass UIViewController. That’s how the frameworks work, and I don’t fight or even complain about this.
But in my own designs I’ve been stepping farther and farther away from subclassing. I like the flexibility, and the lessened brain-hurtiness, of protocols over subclasses.
Which brings me to a related topic: sometimes, at night, when I’m just reading random stuff on my iPad, I like to read about Go (golang). I haven’t used it for anything, but I find it fascinating.
(I’d most likely use it for server-side work, but it’s interesting to imagine as a client-side language.)
Here’s Rob Pike, one of the Go designers, in a talk from June 2012:
My late friend Alain Fournier once told me that he considered the lowest form of academic work to be taxonomy. And you know what? Type hierarchies are just taxonomy. You need to decide what piece goes in what box, every type’s parent, whether A inherits from B or B from A. Is a sortable array an array that sorts or a sorter represented by an array? If you believe that types address all design issues you must make that decision.
I believe that’s a preposterous way to think about programming. What matters isn’t the ancestor relations between things but what they can do for you.
That, of course, is where interfaces come into Go. But they’re part of a bigger picture, the true Go philosophy.
If C++ and Java are about type hierarchies and the taxonomy of types, Go is about composition.
One of the things I like about Cocoa is that it seems (am I right?) less subclass-happy than Java and C++. With protocols, the delegate pattern, and now blocks, we do tend to prefer composition over inheritance more than many object-oriented systems.
(But note that we still do subclass NSView, UIViewController, NSManagedObject, NSOperation, and so on.)
There are a bunch of other cool things about Go. It’s worth reading about, even if it only stretches your software developer’s muscles a little bit.
Another interesting article: Go at Google: Language Design in the Service of Software Engineering, which is a text version of another Rob Pike talk from 2012:
Type hierarchies result in brittle code. The hierarchy must be designed early, often as the first step of designing the program, and early decisions can be difficult to change once the program is written. As a consequence, the model encourages early overdesign as the programmer tries to predict every possible use the software might require, adding layers of type and abstraction just in case. This is upside down. The way pieces of a system interact should adapt as it grows, not be fixed at the dawn of time.
Tim Bray talks about his new Mac setup. My favorite line:
I’m too old now to not use Unix.
Jonathan Grynspan writes on Twitter:
In the general case, requiring single-threadedness is a code smell or worse.
He was reacting to my previous post mentioning the requirement that an API is main-thread-only.
I disagree with Jonathan. I’ll describe why and what I do.
The ideal Cocoa app
In the best-case scenario, that exists only in our dreams, everything runs on the main thread. We don’t need queues or threads because everything is so fast.
In this ideal world we never have to think about concurrency because there’s no such thing.
I’ve never written an app this way and I’m sure I never will. (As computers and devices get faster, apps will be expected to do more.)
But it’s still worth keeping this ideal in mind.
The UI runs on the main thread
There’s no escaping this. The main thread has gravity — code paths tend to start there and end up there.
There’s nothing wrong with recognizing the special-ness of the main thread.
Thread-safety is difficult
You can use a mix of queues, immutable data, and locking, and still get it wrong. Thread-safety is notoriously difficult.
The way to deal with concurrency is not to make everything thread-safe. (That may not be true for server apps, but it’s true for client apps.)
Making everything thread-safe is a lot of effort, and it’s easy to make mistakes. Due to the nature of concurrency bugs, some of those mistakes will show up only as intermittent bugs that are hard to diagnose. The developer may not be able to reproduce them.
What I do
I start with the ideal assumption that everything will run on the main thread.
Once I find that a queue is needed, I keep that queue private to the object that uses it. That object’s public API is main-thread-only, even though internally it uses a queue.
That object’s API may take completion callbacks, and those tend to be called on the main thread.
(I make an exception for objects that work very closely together. That’s fairly rare.)
A typical example:
- (void)notesWithUniqueIDs:(NSArray *)uniqueIDs fetchResultsBlock:(QSFetchResultsBlock)fetchResultsBlock;
The method triggers a fetch from the database on a background serial queue. Once complete, it calls
fetchResultsBlock(notes) on the main thread.
Behind the scenes that object has to deal with concurrency: it fetches notes and updates a cache of uniqued notes. But those concurrency issues are small, well-defined, and limited to the scope of that object — and the caller never, ever has to think about it.
This system works wonderfully well. It doesn’t block the main thread because it does use background queues. And it makes dealing with concurrency as mistake-free as possible because most of the code can assume, correctly, that it’s running on the main thread.
I try to avoid the following pattern, but it’s the best call from time to time.
Foo does some startup in a background serial queue. Parts of Foo’s public API don’t work until that startup has completed, so I want those methods to block callers until startup has completed.
I have not found a great way of dealing with this.
I do something like this:
- init create_lock() lock() startupOnSerialQueue() completion: unlock() - someMethod lock() do_the_thing() unlock()
With this method,
someMethod is blocked while
startupOnSerialQueue() is happening. This is good.
someMethod also gets a lock all the rest of the time. That’s not a big deal, because, in practice,
someMethod is super-fast. But it wouldn’t actually have to have that lock except right at startup.
This feels inelegant to me. It’s simple enough, and it works, but I don’t like it.
Do you know of a better solution?
My first thought is that
dispatch_once would help — but it wouldn’t. (Not that I can see.) It would make sure
startupOnSerialQueue is called once, but doesn’t solve the problem of blocking use of
startupOnSerialQueue has completed.
someMethod could put
do_the_thing() inside a
dispatch_sync block that runs on the background serial queue. But that also seems inelegant. Needlessly complicated.
This reminds me of the old joke. Man: “Doc, it hurts when I go like this!” Doctor: “Well don’t go like that!”
Except that I will go like that when everything else about the design is perfect. Since I can keep the async startup internal to Foo, it’s able to implement its public API without infecting anything else.
So my only question is if there’s a better pattern for handling this.
Update 4:30 pm: To be clear: I believe in serial queues. Absolutely love ’em.
I’ve had varied feedback on Twitter and email.
In the end, this is what I’m doing.
Using an NSLock property — self.startupLock — instead of a pthread_mutex_t. (I always forget about NSLock. But it’s easy and cleaner-looking than pthread mutexes.)
Not just unlocking the lock upon startup completion but nilling it out (as Justin Miller and others have suggested) since it’s no longer needed once unlocked. (In my specific case all access is main thread only, so this is fine.)
In the end this still isn’t beautiful, but it’s simpler and less code than everything else and it solves the problem of making sure that 1) startupOnSerialQueue() doesn’t block the main thread, and 2) someMethod blocks until startupOnSerialQueue() completes, and is fast the rest of the time.
The most interesting of the feedback suggested that this is a job for continuations and futures. So: something to learn. I’ll start with Mike Ash’s article on futures.
Update 10:30 am the next day: Here’s what I really ended up doing.
Note that the previous solution had a deadlock. Sheesh. It was dumb anyway.
Instead I’m just doing startupOnSerialQueue via dispatch_sync. No lock needed.
So it blocks the main thread, but it’s not noticeable. (It’s about as fast as reading a 5K binary plist from disk, which I hadn’t realized at first. That’s not what it’s doing, but performance is comparable.)
Were startupOnSerialQueue slower, I’d have to think of something else. But since it’s fast enough, it’s fine.
So it looks like this:
Issue #10 of the awesome objc.io is all about Syncing Data.
Which means it’s Christmas for a certain type of masochistic geek. (Well, me.)
It’s true; other platforms do suffer from not having quite a few elegant solutions present in Cocoa. (Oh, to have every object solve key-based archiving/coding abstractly, so that you can get universal coverage with your own serialization engine;
NSCodingis the most underappreciated piece of design in all of Cocoa.) And it is very neat when it all works. But what about when it doesn’t?
I should explain why I chose not to use Core Data this time. (Note that I have shipped apps that use Core Data.)
This time it wasn’t the pathological cases like marking 10,000 items as read. Instead it’s two things.
The first is concurrency. I’ve been working with multi-threaded database systems since the mid ’90s, and I understand the issues very well.
I understand the issues well enough to know that every opportunity I can take to simplify concurrency issues to the point where concurrency isn’t an issue is worth taking. (As long as it performs well and doesn’t harm the user experience.)
This becomes even more important when you add something complex — such as syncing — to the mix.
The second reason has to do with my enduring love of plain-ol’ Cocoa. I like regular Cocoa objects. I like being able to implement
hash, and design objects that can be created with a simple
init (when possible and sensible). I especially like being able to do those things with model objects. (Which totally makes sense.)
And I prefer APIs like this…
- (VSTag *)existingTagWithName:(NSString *)name;
- (VSTag *)tagWithName:(NSString *)name;
…to APIs like this:
- (VSTag *)existingTagWithName:(NSString *)name context:(NSManagedObjectContext *)context
- (VSTag *)tagWithName:(NSString *)name context:(NSManagedObjectContext *)context error:(NSError **)error;
Implementing the MetaWeblog API shouldn’t be difficult — but I’ve heard from some people that it can be.
I mentioned that I write for my static blog using MarsEdit, via a MetaWeblog API implementation. (That runs only on my local machine, via WEBrick.)
I just dug up the script. It’s insane Ruby, clearly written by a newbie — but it has also worked with zero issues for five years.
It’s also not something you can copy-and-paste, since it works only with my personal system.
But if it’s enough to help anyone else do a MetaWeblog API implementation, then cool. (It’s really not difficult.)
I’ve written three blogging systems. At UserLand I wrote Manila, which powered editthispage.com and a few other sites way back in the ’90s and early ’00s.
I’ve written two more just for my own use. The first, PHP/MySQL-based, powered this site for seven years. The second, which I still use, is a Ruby static blog generator. I’ve been using it for five years.
I’m a big fan of static sites. In 2011 I wrote A plea for baked weblogs.
But lately I’ve been writing apps in Node.js, which I like, and I can’t help but wonder how I’d do a blog system. (Yes, I’m aware of Ghost. It’s probably quite cool, but I’m too impatient to sit through a video, so I don’t know.)
A Node site could out-perform a static site — in theory, at least.
Here’s how I’d do it:
Blog posts would be stored in a calendar-like folder structure on disk. Files would be markdown files. I would need some convention for storing metadata at the top of each file. (My current blog system uses lines starting with a @ character at the top. Maybe MultiMarkdown has something better for this.)
When a page is requested, the server would render it (using Express, Sass, Jade, whatever: standard stuff) and return it. It would also cache the rendered page in memory, in a dictionary — and so the next time the page is requested it would be served from memory.
Apache and Nginx have to hit the file system for each request to a static site. (Correct? I would think, at least, that they have to hit the file system to check if a file has changed before returning from its own cache. I could be wrong.)
But this Node blog could return a rendered page from memory almost all of the time.
The programming for this is so simple. It begs me to try it. (But I’m resisting.)
One potential problem is not having enough memory to hold all the cached pages. I doubt that would be an issue since usually it’s just recent posts that get hits. Still, though, you’d have to monitor memory use and adjust as needed.
Another possibility would be to use a Least-Recently-Used cache for the pages. Limit the cache to 20 pages, say, and you’d never have memory issues, but the cache would be a little more expensive to maintain.
Another issue is static assets. Node can deal with those okay, but that’s not what it’s best at. I’d want to put images and similar files on S3 instead. Complicates things a little bit, but with good tools and scripts dealing with S3 is fine.
I wouldn’t even bother with an editing UI — instead I’d support the MetaWeblog API so I could use MarsEdit. (I sure wish for MarsEdit for iOS. But perhaps there is, at least, something on iOS that works with this API.)
(My static blog system has no editing UI. I write in MarsEdit. I run a small webserver on my machine that implements the MetaWeblog API.)
I’d also want all the posts to be managed by SCM. (Preferably Mercurial, but Git is okay.) On adding or editing a post it would have to commit the changes — I have to assume there’s a Node module for this somewhere. I’d keep local copies of the repository on my machines at home. (Also means I could use my local copies as a staging server, which is nice.)
When a post is edited or added, the cache would potentially (probably) contain one or more pages that need to be re-rendered. Rather than figure out a dependency system, I’d just wipe the entire cache. Editing and adding pages doesn’t happen so often that this would be a problem. (Sometimes the easy way to solve a hard problem is the best answer.)
What I like about this
Again, in theory this would be fast. Given a reasonable host and no dumb programming mistakes, it should stand up to a Fireballing at least as well as a static site.
I also like that the system would be portable. In the old days the only really portable thing was static sites — you could zip up a folder and move it somewhere else. Easy. (PHP is portable too, since it’s widely-deployed. But often with a PHP site you have the issue of moving a MySQL database, which is a bit of a pain.)
But these days there are so many different Node hosts (Joyent, Heroku, Azure, Nodejitsu, etc.), and the systems for deploying and running Node are so standardized, that you could consider it very portable. Not as portable as a static site, but portable to an important degree.
So now I’m not using Core Data with Vesper. I hope the people who (quite rightly) like Core Data are not disappointed. I like Core Data too and recommend it.
Consider the below not as criticism of Core Data but as a description of what I personally like. Also consider all the things I’m giving up: faulting, NSFetchedResultsController, the Core Data modeler (I’m using a plist instead), and plenty more.
Designing for what I want
Starting over means I could think about what’s important to me in my persistence layer. At a high level, in order, it’s correctness, performance, simple concurrency, low memory use, ease of programming, and flexibility.
The main goal of the design, in other words, is to make it impossible for me to screw up the data. The last few days (since Saturday) I’ve spent writing a new system. This is how it works:
Main thread model objects, background serial queue database
Model objects live on the main thread. This makes it easy to use VSNote, VSTag, and so on in view controllers and in syncing.
There is one exception: you can create a “detached” copy of a model object to use with API calls. A detached model object exists on one thread of execution only, is short-lived, and is disconnected from the database. Detached objects aren’t a factor when it comes to concurrency.
When a model object is added, changed, or deleted, updates to the database are placed in a background serial queue.
Similarly, all fetches happen in that same background serial queue.
This way the model objects and database are always in sync, though the database lags slightly behind until its queue is caught-up.
Concurrency is therefore never an issue, and I never, ever have to worry about the main thread being blocked for database access.
The implicit merge policy is always the same: main thread wins.
Why such a simple concurrency model?
Because sync is hard. There’s a lot of data-merging going on, on the clients and in the web app. Merging is the awful part of sync, but it’s unavoidable.
Since I can avoid yet another case of merging (merging across threads), I will. It’s an entire area that can’t have bugs because it doesn’t exist.
Multiple object types per table
Vesper is typical of sidebar/timeline/detail apps in that a timeline view object needs only a subset of what the detail view needs.
So I have two objects — VSTimelineNote and VSNote — which both come from the same notes table.
VSTimelineNote has five properties, while VSNote has 14 properties and two relationships.
This is all specified in the data model. (Here’s a screen shot of the data model.)
One requirement: each model object must have a uniqueID property. It can be a 64-bit integer, NSNumber, or NSString.
That uniqueID is also the primary key (unique, not null) for the corresponding table. I vastly prefer this to a system where objects have a local primary key that’s separate from its uniqueID.
The problem with systems like that is that duplicates are too easy to create. I want to make it impossible.
An example: a tag’s uniqueID is the lower-case version of its name. (A tag’s name can be edited case-wise, but if it changes otherwise it’s actually a separate tag.)
There’s no chance of creating duplicate tags because their uniqueIDs would be the same, and since uniqueID is also the primary key, I wouldn’t be able to insert that duplicate tag.
Another example: a note is assigned a 64-bit integer uniqueID on creation on your day phone. That same integer its primary key. On the sync server, that note’s primary key is compound (uniqueID, userID). When the note later syncs to your night phone, it still has that same uniqueID, and the night phone uses it as the primary key. So it can’t create duplicate notes.
All relationships are ordered. (Via a lookup table with parentID, childID, ix, where ix is the order.)
My system doesn’t do inverse relationships, but for my uses that’s not an issue. VSNote has a to-many relationship to VSTag and to VSAttachment, but to get all notes for a tag I have to do a fetch. Which I don’t mind.
When objects are fetched, their related objects are also fetched. This is done as efficiently as possible: for instance, if three notes are related to a given tag, that tag is fetched just once. (If the tag is already cached, it’s not even fetched once. And, in fact, I’m caching all the tags at startup on purpose, since they’re small and there aren’t many of them.)
Some objects are uniqued and some aren’t. Not using uniquing is a performance benefit, except when I could end up with lots of instances of what should be the same object.
In Vesper, VSNote and VSTags are uniqued, but VSAttachments aren’t, because a tag can be related to many notes while an attachment can be related to only one note, so it’s unlikely that a single attachment would have multiple copies at once.
(The data model has a per-object
Uniqued objects are cached in an NSMapTable with weak references. That means a note is cached as long as there’s a reference to it outside the cache.
Optionally, an object can be cached permanently, and all objects can be fetched and cached on startup. (There are keys in the data model for this. I do both with tags.)
Deleting n objects (of the same class) takes one SQL call. If the object has relationships, the entries in the lookup tables are also deleted. To delete 10,000 objects, where that object has two relationships, takes 3 SQL calls. (One for its table and one for each relationship’s lookup table.)
Note that this can lead to orphans — there’s nothing like Core Data’s cascading deletes. Orphans are my responsibility.
Most of syncing happens off the main thread. Networking doesn’t block the main thread, and parsing the JSON return data also happens off the main thread. Fetching data to send to the server happens in the database’s background queue.
Merging data happens on the main thread, but the database fetches to get that data and the database updates to save that date happen in the background.
So, even though model objects are updated on the main thread, the impact is low. It’s all fast, in-memory operations. If I find later that the impact is not low, I can optimize by merging only cached objects on the main thread and otherwise merging data in the background queue. (I doubt this would be necessary.)
Model objects implement the QSDataObject protocol. They must have a uniqueID property and may optionally implement awakeFromFetch. They should be init-able via init. They should actually have the properties claimed in the data model.
Otherwise they can be whatever. They are not subclasses of some model object class. Creating an object doesn’t hit the database on the main thread, and I can create an object without ever saving it to the database.
Though there is a data model defined in the plist, table and index creation is done via hand-specified SQL. This allows me to exactly tune how the database is constructed. (I especially like being able to specify which things are unique.)
And it allows me to add tables that don’t appear in the data model and that aren’t used by this system. (An example is the deletedNotes table, which is just a single-column table of uniqueIDs that isn’t backed by a model object.)
Finally, because it’s all just FMDB and SQLite, I can do things that step outside of the basic system. I can solve the “RSS reader problem” — I can mark 10,000 items as read all at once with one SQL call. (The only caveat is that cached objects also have to be updated. Which is easy: just a few lines of code. I could easily add built-in support for this kind of thing to the system if needed.)
I mentioned that the RSS reader problem isn’t an issue with this system: bulk deletes are built-in, and bulk property changes are easy to do.
While I don’t have plans for another app, I like knowing that the system would work well for these two scenarios:
User-created and synced data.
Web data with some user-created and synced data. (RSS, podcast, and Twitter clients, for instance. Apps like Glassboard and MarsEdit.)
All the apps I’m likely to ever create fall into these two categories.