Alcides Fonseca

40.197958, -8.408312

Writing a compiler using Python, Lex, Yacc and LLVM

I found a good post on how to build your own toy compiler using Flex, Bison and LLVM. I saw one disadvantage right in the beginning: you had to use C++. If I were just prototyping a compiler, I wouldn’t use C++ but rather a dynamic language. And last semester for the Compilers course that’s what I did.

Students were assigned to build a Pascal compiler (actually a subset, but not that small) and the tools suggested were Lex, Yacc (using the C language) and compiling the code into C. I took a different approach and decided to do the project in Python (I actually tried ruby first, but the ruby-lex and ruby-yacc projects didn’t pass my basic tests).

I wrote the language grammar using PLY (the lex and yacc DSLs for python) and it was pretty simple. As for the AST generation, I had only a class Node that accepted an type and a list of arguments while my colleagues using C had to make 1001 structs for each kind of node. Not that it wasn’t impossible using C, but dynamic languages make the code simpler and more clear.

For the code generation, I decided to go with LLVM. It is a very promising project. Just take a look at google’s unladen-swallow or macruby, even parrot is planing on using llvm for their JIT.

For writing the code in Python, I had to use the llvm-py which I may say it’s in a early stages and lacks documentation. That was my major problem using. I had only three resources: the official guide, a presentation in japanese with some source code, and the actual source of the project (in C and C++).

Since every time I got an error in the llvm code generation it crashed the program, I had to dig into the source code of the project and find that error message and reverse engineer what was wrong with my code (usually I was giving values or pointers instead of references and vice-versa). So if you are doing something more complex, you actually need some C++ reading skills.

The project however worked, and I’m making it available so anyone may use the code as an example until better resources are published.