This is the 12th article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.
In this article, we add our first control flow construct, the if-then-else conditional.
We extend the AST with a new If
node containing the condition, the branch to execute if the condition is true and the one to execute if the position is false.
We parse if-then-else constructs in a very straightforward manner. In this first approach, we consider the else
block mandatory. The direct consequence is that there is no dangling else problem. A second point that makes our implementation strategy trivial is that we cheat by abusing the precedence system, rather than significantly altering our grammar.
If-then-else constructs are expressions in OCaml and can appear in any place where an expression can appear. This gives rise to a lot of conflicts if we extend the expr
rule naively.
expr ... | IF expr THEN expr ELSE expr { label 0 (If($2, $4, $6)) }
For example, how should if c then t else e + 42
be parsed? As (if c then t else e) + 42
or as if c then t else (e + 42)
? The later parsing is the OCaml one. We parse the expr
after the ELSE
in a greedy manner. This can be expressed in a LALR(1) grammar. However, it would degrade grammar legibility in our opinion. We prefer to keep the grammar simpler, adding a precedence rule to specify that else
binds less tightly than operators.
%nonassoc ELSE %left LOWER LOWER_EQUAL EQUAL NOT_EQUAL GREATER_EQUAL GREATER ...
Typing is no more complicated. The condition should be bool
while the branches can be any type as long as their types match.
let rec annotate_ast type_env ast = match ast.node with ... | If(cond, then_, else_) -> let cond = annotate_ast type_env cond in assert_type cond Type.Bool; let then_ = annotate_ast type_env then_ in let then_type = type_of then_ in let else_ = annotate_ast type_env else_ in assert_type else_ then_type; { label = (ast.label, then_type); node = If(cond, then_, else_) }
We could provide explicit error messages stating that the condition should be bool
, or that both branches of the conditional should have matching type, rather than relying on Mismatch
, which will just report that some type has been inferred when another one was expected. We will, however, defer these kinds of refinement to keep our typer simpler until our compiler is a bit more advanced.
This section is not as self-contained as we initially intended. If you are not familiar with SSA form (and in a lesser extent, by its handling in LLVM), your are strongly encouraged to read the corresponding part of the LLVM tutorial if you do not grasp the material presented here.
Compiling conditionals is a bit more involved because LLVM requires them to be in static single assignment form (SSA). In SSA form, each variable is assigned exactly once, whatever the code path may be. It means we cannot compile if c then t else e
as
%c = ... ; compiled c br i1 %c, label %then, label %else then: %result = ... ; compiled t br label %endif else: %result = ... ; compiled e br label %endif endif:
because %result
would be assigned twice, once in the then
block and once in the else
block.
We solve this problem with the phi
instruction. This instruction magically knows from which block the control comes from and can select a register to use accordingly. The LLVM IR for our example becomes (t
is the LLVM type for the result)
%c = ... ; compiled c br i1 %c, label %then, label %else then: %res_t = ... ; compiled t br label %endif else: %res_e = ... ; compiled e br label %endif endif: %result = phi t [%res_t, %then], [%res_e, %else]
phi
will assign to %result
either %res_t
or %res_e
, depending whether control comes from %then
or from %else
(respectively).
We could cheat by using a memory reference to store the result. Memory references are not affected by SSA as they are assigned to indirectly via store
operations. Moreover, the LLVM optimizer has a pass which quite effectively remove needless memory references to replace them by code using SSA registers (the traditional variables prefixed by a %
in LLVM IR). However, in this simple case, we chose the more direct approach with the phi
instruction.
We proceed as follows. First, we compile the condition. We then create three blocks: a then
block, an else
block and a continuation block. Then, we branch either to the then
block if the condition is true
or to the else
block if it is false.
The then
and else
blocks follow the same pattern. We position the builder at the start of the block, we compile its expression and we record the current block. This is necessary because we may have changed the current block when compiling the expression, and we will need the block were the result lies for the phi
instruction. We then branch unconditionally to the continuation block (there is now fall-through in LLVM IR, the branch is mandatory even if the block that directly follows is the continuation block itself).
The continuation block simply merges the then
and else
branches with a phi
instruction.
let rec compile comp ast = match ast.node with ... | If(cond, then_, else_) -> compile_if comp cond then_ else_ and compile_if comp cond then_ else_ = let cond = compile comp cond in (* Create then, else and continuation block *) let f = block_parent (insertion_block comp.builder) in let then_block = append_block "then" f in let else_block = append_block "else" f in let cont_block = append_block "cont" f in (* Conditional branch *) ignore (build_cond_br cond then_block else_block comp.builder); (* Then block *) position_at_end then_block comp.builder; let then_ = compile comp then_ in (* Current block may be different from then_block, save it for phi *) let then_block = insertion_block comp.builder in ignore (build_br cont_block comp.builder); (* Else block *) position_at_end else_block comp.builder; let else_ = compile comp else_ in let else_block = insertion_block comp.builder in ignore (build_br cont_block comp.builder); (* Continue, merge the two branches *) position_at_end cont_block comp.builder; build_phi [(then_, then_block); (else_, else_block)] "if" comp.builder
Our approach is a bit simpler than the one presented in the LLVM tutorial. It may potentially decrease performance as blocks are generated in a different order (think of nested if-then-else). We have no indication, however, that our approach is intrinsically worse (it could as well be perfectly equivalent, or even better as far as we know). We will keep the simpler version for now, deferring the performance analysis to a (much) later time.
Now that we are equipped with arithmetic operators, comparison operators and conditionals, we can start writing interesting recursive functions. We add the following naive recursive implementations of the factorial and Fibonacci series to our test cases.
let rec fac (n : int) : int = if n < 0 then - (fac (-n)) else if n < 2 then 1 else n * fac (n - 1);; let fac1 = fac 1;; let fac12 = fac 12;; let rec fib (n : int) : int = if n < 0 then - (fib (-n)) else if n < 2 then 1 else fib (n - 2) + fib (n - 1);; let fib1 = fib 1;; let fib45 = fib 45;;
Passing by, although we did not yet investigate performance, we note that the previous test takes a bit more than 10 s with the Photon interpreter while it takes around 85 s with the OCaml one (our our 2.67 GHz Core i7 laptop). While it is too soon to draw hasty conclusions, it is still an encouraging result that plead for JIT compilation rather than full interpretation.
In the next installment, we will do some clean-up.
The code accompanying this article is available in archive photon-tut-12.tar.xz or through the git repository:
git clone http://git.legiasoft.com/photon.git cd photon git checkout 12-conditionals
Discussion