Scaling and Performance

In the recent Accidental Tech Podcast John Siracusa pointed out that I had talked about scaling when I was really talking about performance.

John’s right, and I know better.

In the vernacular sense of scaling the two things are related, but I should use more precise language.

So I’ll be more precise: my main goal is to maximize performance and minimize the amount of resources used. My second goal is to design so that I have scaling options if needed.

(Performance is related to scaling only in the sense that it lessens the need to scale, but it doesn’t solve the problem of scaling itself.)

Here’s what I’m doing for actual scaling:

  • Using a system that allows for multiple web servers to be automatically created as needed.

  • Using a SQL Server database with a maximum of 150 GB.

  • Using blob storage for binaries (images), which can scale forever (presumably).

The weakest link here is, I think, the database. If I have to, I can split up the data into separate databases. There are just four tables — accounts, deletednotes, notes, and tags — and each of those could be moved into a separate database. (There are no joins and no foreign key constraints.)

I don’t expect to ever come anywhere near doing that, so I’m not actually planning the migration steps. But it’s in the back of my head that there’s a non-zero probability.

And if I have to go even further, I can. The biggest of the tables, by far, is the notes table. I can break that table out into separate databases, separated by userID. (Which is an integer. Each database would store notes for a range of userIDs.)

And, if that’s not enough, the remaining tables (accounts, deletednotes, and tags) could also be broken out the same way.

I don’t think I’ll ever have to do any of this — but I can, if I need to, with only small code changes.

The web server may also be a weak link. The way to solve this one — not that I think I’ll need to, because I can run a bunch of instances of the server — is to create separate API servers. Each endpoint could be a separate server with its own set of instances. This would require a small code change on the client, but I’d see this coming and make sure it gets done in plenty of time.

PROCEDURE

Check out the code listings in More Macintosh Toolbox (for instance) for a reminder of the role Pascal used to play in our world.

Pascal’s cool.

Craig on X 10.10

Craig thinks Helvetica Neue will be the next Mac system font, and suggests you start testing your apps with it now.

IAC on iOS

Check out GCDWebServer. (Via iOS Dev Weekly, which you should subscribe to.)

GCDWebServer is an embeddable and lightweight http server for Mac and iOS. On iOS it runs as a background task — it keeps running even when your app isn’t in the foreground.

Picture this: app X wants to send some data to app Y.

App Y is running GCDWebServer, so app X wraps up the data as JSON and talks to app Y via http. Just as if it were talking to some server on the web, only it’s a local app.

Update 4 pm: Or not. I’m not sure it’s possible, or maybe just not allowed, to keep a network server running when the app is in the background.

Quartz Composer + Snapping Scroll

Two tutorials from Dawid Woldu fascinate me. I keep wanting to make time to get into Quartz Composer.

The Science Behind Snapping Scroll – Part I: Dragging

The Science Behind Snapping Scroll – Part II: Animation & Logic

Quartz Composer isn’t included in Xcode these days, but you can still download it from Apple. (Find “Graphics Tools for Xcode.”)

Node + MongoDB + iOS

A two-part tutorial from Michael Katz is a good place to get started writing services. Node makes a great API server. And it’s fun.

How To Write A Simple Node.js/MongoDB Web Service for an iOS App

How to Write An iOS App that Uses a Node.js/MongoDB Web Service

The tutorials use MongoDB. I haven’t had a good reason for a NoSQL database myself lately — but my early career, back in the ’90s, was all about schema-less databases, and I have a major soft spot for them.

(I’m digressing now.)

Frontier’s database was a hierarchy of tables. Each table could contain anything, including other tables — including even your scripts.

To run a script named myScript inside the bar table which was inside the foo table, you’d write foo.bar.myScript(params).

If that script took a string as a parameter, say, you could use a local variable or reference any string anywhere in the database: myApp.data.settings.username, for example. This was all presented with a user interface, navigable and editable.

I haven’t seen a database like that anywhere else since then. So easy and intuitive. Great for productivity. (It was within this laboratory that such things as templated and scripted websites, blogs, RSS, OPML, and XML-RPC were invented and/or fleshed-out.)

Web, Money; App Store, No Money

Subvert: Why we chose to build our core software business on the open web instead of on a closed app store platform:

More so, unless you have huge sales volumes, it’s near impossible for a company selling $2.99 apps (our lowest cost product and a fee which, in the app store world, is considered wildly exorbitant) to make enough money to support their team, pay for office space, keep up to taxes, cover fees and look after all of the financial requirements that it takes to run a real business.

What is making money for them? A web app.

We’ve done next to zero promotion of this software-as-a-service (SaaS) product since we started building, using and supporting it two years ago. Instead, we’ve been slowly adding to the system, improving the platform and signing up paying customers on a regular basis. As a result, the product has been making money — real money — and long ago surpassed the combined revenues of our other commercial software applications in a big way.

I’m not sure that the choice is between web apps and app store apps — I think it’s actually about standalone apps versus apps-plus-services. But, still, read the post.

That’s No Button

Manton Reece explains how tint color is misused:

The problem is the implementation in apps that use tint color anytime they want to highlight something, whether it is tappable or not.

Skala Color

It’s free. I’ve spent a zillion hours with their app Skala Preview. Bjango does good work.

App Camp For Girls Pitch Sessions

There’s a video. Hearing that Jean MacDonald has gone full-time with App Camp made me glad.

Glassboard Board for The Record

The board is for questions, feedback, and Mac nerd nostalgia. Join with the invitation code APUME.

Vesper Sync Diary #14 - Keys

Here’s the initial design: the text of notes is encrypted in the database. The key is not stored in the source code. (The source code could get out and you wouldn’t be able to decrypt notes.)

That design is a no-brainer, and I thought I was finished at that point. But then I did a design review with some security folks, and they suggested I revise it like this:

The text of notes is encrypted in the database. They key is not stored in the source code. The key should change from time to time. To make this work:

  • Always encrypt using the latest key.

  • Add some token to the text that lets me know if it’s been successfully decrypted.

  • Try decrypting using the latest key first. If the token doesn’t appear in the right place, use the previous key. Repeat if necessary until decrypted.

  • Add a new key regularly. (Twice a year, for example.)

  • Be prepared with a script that re-encrypts all note text with the latest key, in case there are any security concerns at all.

This is sensible. I can’t think of any reason not to do it this way. Is there anything I’m missing?

Clangalyzer

Keith Harrison shows how to run custom Clang analyzer builds:

The open source build is updated frequently with bug fixes and extra checks. This makes it more likely to spot an error in your code that would go undetected with the Xcode bundled version.

Got @inessential

I went through Twitter’s impersonation process and was able to get the @inessential username.

The way it worked is that Twitter renamed @inessential_com, which I created a few days ago, to @inessential.

So if you followed @inessential_com, you’re now following @inessential, and there’s nothing you need to do.

Justin Refactors

Justin Williams: Refactoring in the Cloud:

Given that I’m not a fan of the C# threading model currently implemented, the amount of code used to power something that’s relatively simple, and the homegrown nature of many of the push wrappers we’re using, I made the decision to rewrite this portion of Glassboard’s architecture using Node.

Justin’s use of Azure is quite different from mine. Glassboard is more complex than Vesper syncing, which should be no surprise.

(If you think about it, you realize that Glassboard does syncing, but it’s silly to call it that because it’s a natural part of what’s expected of a group messaging app.)

Vesper’s architecture is smaller. There are two main components: an API server (Node.js Mobile Services app) and a reset-password site (Azure web site, also Node.js). There are no architecture-level refactorings for us to do at this point.

We’re at about 2,000 lines of server-side code. Small.

Premature Optimization and Servers

More than one person has suggested I’m guilty of violating the law of premature optimization when it comes to my server work.

Here’s the thing, though: when it comes to database schema, I really, really want to get it right before shipping.

Making code changes in a client app is normal. Making database schema changes in a client app is a pain, but not the worst thing.

Making code changes on the server is normal too, though a little hairy. But the hairiest of all is database schema changes on the server. I’m designing so that I don’t ever need to do that. (I may not reach that goal. Time will tell.)

Even though Brian Reischl wrote up how to do data migration, and so I have a good plan if I ever need to go there, I just don’t ever want to go there.

In other words: getting the server-side database schema right right now isn’t premature — it’s exactly the right time.

More on UUIDs and Clustered Indexes

My SQL Server genius pointed me to this article by Kimberly Tripp about the problem with GUIDs as primary and/or clustering key. (GUIDs and UUIDs are the same thing in this context. Microsoft folks often call them GUIDs.)

Another article suggests this isn’t a big deal with Azure SQL because a network write is slower than a page split anyway.

But the advice still seems to be that UUIDs don’t make the best clustering key. You want something narrow to keep down index size.

So I’ve slept on it. Do I still like this layout for the notes table?

id - auto-incrementing integer clustering primary key
noteID - UUID, unique
userID - int

Yes, I like it.

It’s the API?

Cesare Rocchi argues that the Heartbleed problem isn’t C — it’s the API.

My counter-argument: people make mistakes. People make dumb APIs. With C, combine a dumb API and a mistake and you get Heartbleed.

That’s far less likely with another language.

Here’s the thing: we will always have dumb ideas and mistakes. We can and should do our best to eliminate them, but we’ll never succeed entirely. Because we know that, we’re negligent if we don’t do our best to minimize their consequences.

Book Idea

Graham Lee notes that computing turns 100 probably within your lifetime and proposes a book about programming that I want.

Vesper Sync Diary #13 part 3 - Thinking Too Much

There are two pieces of advice I’ve been getting:

One is that I’m thinking too much about this. It’ll be fine if I have a properly normalized schema and I use the appropriate indexes. After that, don’t worry.

(I admit that I’m prone to going down every performance rabbit-hole I can find.)

The other — partly related — is that the right way to deal with the notes table is to do create a clustered primary key as userID + noteID. This way all notes by a given user will be together.

And this is the default behavior. It’s good. Smarter people than I am have thought about this.

And I can drop the integer identity column.

Update a few hours later: No. Wait. The best database guy I know tells me to do it the way I was thinking: surrogate key integer identity column as clustering key.

He also suggests I don’t need to add a unique constraint for noteID + userID, since noteID is a UUID. A unique constraint on noteID is all that’s needed.

Archive