I spent the weekend making a bunch of progress on the compiler. It has two pieces: a tokenizer, which I created by rewriting the original C code (langscan.c) in Swift, and a parser.
The parser in OrigFrontier was generated by MacYacc, which is similar to Yacc, which is similar to Bison, which is on my Mac. The thing about the parser is that it’s C code, and the rest of the app is Swift.
How do you bridge the two worlds? Easy answer: with Objective-C, which is a superset of C and which plays nicely (enough) with Swift.
So I renamed langparser.y — the rules file that the parser generator uses — to langparser.ym so that Xcode would know to treat the generated parser source as Objective-C. I edited it slightly, not to change the grammar rules but to change how nodes are created (as return values rather than via inout).
I also made my CodeTreeNode class, written in Swift, an Objective-C class so that it would be visible to my Objective-C code.
And then, finally, I started a build…
…and then it stopped with an error because the parser places my CodeTreeNode
in a C union, which isn’t allowed in ARC.
Crushed.
* * *
I think I have three options:
- Go down the rabbit hole of figuring out how to get the parser to work with ARC.
- Go with the flow: have the parser generate nodes that are, as in OrigFrontier, C structs. The last compilation step would be Objective-C code that translates that tree of C structs into a tree of
CodeTreeNode
objects, and then disposes the C-struct-node-tree.
- Write the parser by hand, in Swift.
My thinking:
I could waste a ton of time on #1, and bending tools in that way can be pretty frustrating work when they refuse to bend.
With #2 I’d feel a bit weird about the redundancy: building a tree and then building a copy of that tree with a different type of object.
My heart tells me #3 is the answer. After all, I’ve already done the tokenizer. How hard would it be to parse those tokens into a code tree? I could skip C and Objective-C altogether and stay in Swift. And it would be so fun. (Because that’s precisely the style of weirdo I am.)
* * *
But the real answer is #2. Writing a parser by hand would take way longer than I think. Given enough tests, it shouldn’t be a huge source of bugs, but still.
The thing about #2 is that yes, it’s redundant, it’s doing more work than it needs to, ideally — but my bet is that it would still be so fast that you wouldn’t be able to tell the difference. Computers are so good at this kind of thing. It’s not like reading files or networking; it’s just in-memory traversal and creating/releasing things.
You remember in Indiana Jones that guy with the twirling swords, and Indy gives that look and then just shoots him? The second option is the Indiana Jones solution.
Update 2:05 pm: Two people have already written me to recommend ANTLR. So I will definitely give that a look. It might be exactly what I need.