Hooking up MPC & LLVM
I’ve been tinkering around with mpc and llvm recently - just to satisfy a few simple questions:
- How easy is it to use mpc?
- How easy is the LLVM c api to use?
- How easy is it to connect a parser generator to an LLVM Module?
So given the above aims, I’ve embarked on creating my own stupid little language, that I’m calling ‘neil‘ - Not Exactly an Intermediate Language. It is utterly by chance that the name ‘neil‘ happens to be my own name, if you’ve believe that gratuitous lie!
A ‘neil‘ program currently consists of one allowed type - a 32 bit integer named i32, function definitions, function calls, and being able to return the results of one function from another. In short, it is (currently, and most likely permanently) a really stupid language.
An example legal ‘neil’ program is:
So lets get right into how we construct this grammar in mpc. The mpc grammar for our ‘neil‘ language is as follows:
It’s a very simple grammar, I looked at the mpc - smallc example to work out what to do. To parse using mpc, I have the following function:
With this, we can then iterate through a successfully parsed ‘neil‘ input using the:
type that mpc provides. The struct elements are as follows:
- tag - the string name of the ast node’s type
- contents - what the ast node is pointing at in the original source input
- state - the line information for the ast node
- children_num - the length of the children array
- children - an array of AST node’s that are children of the current node
Next up, we need to create what we need from LLVM to produce a module for the current file. I’m using the LLVM C API because I’ve never used it before, having only ever used the C++ API in my day to day stuff at work, so I thought it would be useful to have a look at it.
First up we need an LLVM Module - this is an encapsulation of a ‘neil‘ input file for our uses. I use:
To get a module to work with. Then, for each AST node that is a procedure, I create a corresponding LLVM function with:
(Note I’m cheating for now because I know my functions return i32, and take no params, don’t do this in production code!).
Then, since I have no control flow within my functions, I can create one basic block to hold the body of the function, and an IR builder to help us make the instructions within the basic block, with:
And with this we can begin to parse the body of the functions!
I parse the return statement (the only allowed statement within our functions), and check if it returns a literal or the result of a call to a function:
And the result is:
From the example ‘neil‘ file I gave above.
TL;DR mpc is pretty cool, the LLVM C API is very easy to use, and I’ll be fleshing out my ‘neil‘ language in future blog posts once I try handling a more complicated input grammar.
PS the full source for the example is below: