inessential by Brent Simmons

libxml2 + xmlTextReader on Macs

I’ve used a few different XML parsers on Mac OS X—including CoreFoundation’s XML parser and NSXMLDocument.

But recently, for technical reasons, I couldn’t use any of the XML parsers I was already using. And, furthermore, I had reason to want to use a stream-based parser rather than a parser that builds a tree. (For better performance and lower memory use.)

I figured that probably meant using a SAX-ish API. (SAX == Simple API for XML) But I’ve never wanted to deal with SAX because it meant writing a bunch of code to deal with state, and that’s just a pain. (Honestly. No matter what Gus says.)

So I found my way to libxml2 and its SAX2 module. Eh, okay, I’ll do this, I guess. Maybe it’ll even be fun! (Really thinking, probably not fun.)

Then somehow I ran across the xmlreader module. It turns out to be exactly what I wanted—stream-based and fast—without being a big pain like SAX.

(It’s a clone of the xmlReader .NET interface. It’s possible that it’s very commonly-used in the Windows world.)

xmlTextReader works like this:

loop until done
	GetTheNextBitOfXML
	DoSomethingWithItIfYouWant

Right. No callback functions (as in SAX). Just loop through the XML until you’re done.

See the Libxml2 XmlTextReader Interface Tutorial.

And here’s a demo project (BSTweetParser) that downloads the Twitter public timeline and parses it into Cocoa objects, an array of dictionaries. (Twitter stuff makes for great sample code.)