This is the 11th article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.
In this article, we add comparison operators to our language.
We add tokens for lower (<
), lower or equal (<=
), greater (>
), greater or equal (>=
) and not equal (<>
). We reuse equal (=
) for structural equality and we ignore physical equality for now.
We add constructors for these operators to our bin_op
type.
type bin_op = Cmp_eq (** [=] *) | Cmp_ge (** [>=] *) | Cmp_gt (** [>] *) | Cmp_le (** [<=] *) | Cmp_lt (** [<] *) | Cmp_ne (** [<>] *) ...
These operators binds less strongly than arithmetic operators meaning that a + b = c + d
is interpreted as (a + b) = (c + d)
. We put all the comparison operators on the same level. We make them left-associative. It is a bit dubious but that is the way OCaml handles them. Wrong associativity assumptions will more often than not be reported as type errors anyway.
%left LOWER LOWER_EQUAL EQUAL NOT_EQUAL GREATER_EQUAL GREATER %left PLUS PLUS_DOT MINUS MINUS_DOT ...
Finally, we add expr
rules for the newly defined operators.
expr ... | expr EQUAL expr { label 0 (Bin_op($1, Cmp_eq, $3)) } | expr GREATER expr { label 0 (Bin_op($1, Cmp_gt, $3)) } | expr GREATER_EQUAL expr { label 0 (Bin_op($1, Cmp_ge, $3)) } | expr LOWER expr { label 0 (Bin_op($1, Cmp_lt, $3)) } | expr LOWER_EQUAL expr { label 0 (Bin_op($1, Cmp_le, $3)) } | expr NOT_EQUAL expr { label 0 (Bin_op($1, Cmp_ne, $3)) }
Comparisons operators are polymorphic in OCaml, i.e. they work on any type. Operands to a comparison are only required to have the same type. The resulting type of the comparison is of course bool
.
In Photon, we will be a little more demanding, ruling out more invalid comparisons during typing at compile type. In OCaml, one can compile a program where two functions are compared. The program compiles fine but throw an exception at runtime:
Fatal error: exception Invalid_argument("equal: functional value")
The same program in Photon gives the following error message at compile time:
<stdin>:1.1: cannot perform equality tests on type int -> int
More specifically, we distinguish:
char
, float
, int
and string
(lexicographic order);bool
1) and unit
;
We use the following two helpers to check type comparability. Each one throws a newly defined error in case the given type does not support the requested operation. Their names should be familiar to Haskell programmers2)
(* [assert_is_eq ast] asserts that [ast] has a type suitable for equality * tests. *) let assert_is_eq ast = match type_of ast with (Fun(_, _) | Ref _ ) as t -> raise (Error(pos_of ast, Not_eq t)) | (Bool | Char | Float | Int | String | Unit) -> () (* [assert_is_ord ast] asserts that [ast] has an ordered type. *) let assert_is_ord ast = match type_of ast with (Bool | Fun(_, _) | Ref _ | Unit) as t -> raise (Error(pos_of ast, Not_ord t)) | (Char | Float | Int | String) -> ()
The type annotation of binary operations is now as follows.
let rec annotate_ast type_env ast = match ast.node with Bin_op(lhs, op, rhs) -> let lhs = annotate_ast type_env lhs in let rhs = annotate_ast type_env rhs in let ret_type = match op with (Cmp_eq | Cmp_ne ) -> assert_is_eq lhs; Type.Bool | (F_add | F_div | F_mul | F_pow | F_sub) -> assert_type lhs Type.Float; Type.Float | (I_add | I_div | I_mul | I_sub) -> assert_type lhs Type.Int; Type.Int | (Cmp_ge | Cmp_gt | Cmp_le | Cmp_lt) -> assert_is_ord lhs; Type.Bool in assert_type rhs (type_of lhs); { label = (ast.label, ret_type); node = Bin_op(lhs, op, rhs) } ...
To keep compile
short, we delegate the compilation of binary operations to a separate function compile_bin_op
.
let rec compile comp ast = match ast.node with Bin_op(lhs, op, rhs) -> let t = type_of_ast lhs in let lhs = compile comp lhs in let rhs = compile comp rhs in compile_bin_op comp op t lhs rhs ...
compile_bin_op
compiles arithmetic operations as given previously, and itself delegates the compilation of comparisons to a compile_comparison
function.
let compile_bin_op comp op t lhs rhs = match op with (Cmp_eq | Cmp_ge | Cmp_gt | Cmp_le | Cmp_lt | Cmp_ne) -> compile_comparison comp op t lhs rhs | F_add -> build_fadd lhs rhs "sum" comp.builder ...
Compiling comparisons proceeds as follows. We first match on type, and then on the operator. Types supporting only equality tests only match Cmp_eq
and Cmp_ne
while ordered types match all Cmp_*
operators.
Integer type (bool
, char
, int
) comparisons are evaluated by the icmp
instruction. Our first idea was to do the exact same comparisons for all integer types. We however changed our mind for the following reasons:
bool
should have been ruled out by the Typer
, we prefer to keep it separated in the Compiler
as a redundant check.char
comparisons should be unsigned. LLVM integer types have no signedness. The operations specify how to interpret the types (signed division sdiv
vs unsigned division udiv
, icmp uge
vs icmp sge
, …). While our integers are signed and should be compared with the signed operators, we expect '\000
' to be lower than '\255
' so characters should be compared unsigned.
We thus have three separate cases for bool
, char
and int
, although these are very similar.
let rec compile_comparison comp op t lhs rhs = match t with Bool -> begin match op with Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder | _ -> failwith "compile_comparison: Bool" end | Char -> begin match op with Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder | Cmp_ge -> build_icmp Icmp.Uge lhs rhs "ige" comp.builder | Cmp_gt -> build_icmp Icmp.Ugt lhs rhs "igt" comp.builder | Cmp_le -> build_icmp Icmp.Ule lhs rhs "ile" comp.builder | Cmp_lt -> build_icmp Icmp.Ult lhs rhs "ilt" comp.builder | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder | _ -> failwith "compile_comparison: (Bool | Char | Int)" end | Int -> begin match op with Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder | Cmp_ge -> build_icmp Icmp.Sge lhs rhs "ige" comp.builder | Cmp_gt -> build_icmp Icmp.Sgt lhs rhs "igt" comp.builder | Cmp_le -> build_icmp Icmp.Sle lhs rhs "ile" comp.builder | Cmp_lt -> build_icmp Icmp.Slt lhs rhs "ilt" comp.builder | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder | _ -> failwith "compile_comparison: (Bool | Char | Int)" end
Floating-point number comparisons are very similar to integer comparisons, except that the fcmp
instruction is used in place of icmp
. There is no signedness considerations with fcmp
, but we have to choose between what LLVM calls ordered and unordered modes. Ordered mode rules out not-a-number operands and it is the one we chose.
| Float -> begin match op with Cmp_eq -> build_fcmp Fcmp.Oeq lhs rhs "feq" comp.builder | Cmp_ge -> build_fcmp Fcmp.Oge lhs rhs "fge" comp.builder | Cmp_gt -> build_fcmp Fcmp.Ogt lhs rhs "fgt" comp.builder | Cmp_le -> build_fcmp Fcmp.Ole lhs rhs "fle" comp.builder | Cmp_lt -> build_fcmp Fcmp.Olt lhs rhs "flt" comp.builder | Cmp_ne -> build_fcmp Fcmp.One lhs rhs "fne" comp.builder | _ -> failwith "compile_comparison: Float" end
Unit comparisons requires no instruction at all. The unit type containing the single element ()
, it is always equal to itself. We thus always return true for =
and false for <>
. One may wonder why we would like to compare units. It may arise when using generic code to be introduced later.
| Unit -> begin match op with Cmp_eq -> const_true | Cmp_ne -> const_false | _ -> failwith "compile_comparison: Unit" end
Finally, strings are compared lexicographically by first calling Ord_string_compare
. This function compares the two given strings. It returns an integer less than, equal to, or greater than zero if the first string is found, respectively, to be less than, to match, or be greater than the second one.
The obtained number is then compared to zero to determine the result of the comparison. Ord_string_compare
is defined in create
as an alias to strcmp
.
| String -> let cmp = Hashtbl.find comp.globals "Ord_compare_string" in let i = build_call cmp [|lhs; rhs|] "cmp" comp.builder in compile_comparison comp op Int i (const_int int_type 0) | _ -> failwith "compile_comparison: unsupported type"
We only casually tested our compiler through interactive interpreter sessions up to now. Our compiler has reached a state where we can no longer avoid to test it, at least a little bit. We need to gain a little confidence that our compiler does at least some things right and we need to be able to check that future modifications will not break what used to work in previous versions.
We thus add a tests
directory to our source tree. As Photon is a subset of OCaml, we intend to do most tests by comparing the results of our interpreter (or compiled program) to the results of the OCaml interpreter (or program compiled with the OCaml compiler).
The idea is as follows:
For comparisons, the two output files will be equal if we remove the version notice in the first two lines of OCaml interpreter output.
Rather than loading a comparisons.ml
file directly, with hard-coded comparisons which would be boring to write and would not cover many cases, we will do it in a QuickCheck-like way. We write a comparison generator in comparisons.ml
and use the generator to feed us with test cases in comparisons.input
.
input=comparisons.input ocamlOut=comparisons.ocaml.out photonOut=comparisons.photon.out rm -f $input $ocamlOut $photonOut ocaml comparisons.ml $n > $input ocaml < $input | tail -n +3 > $ocamlOut ../photon.native < $input > $photonOut if diff $photonOut $ocamlOut >/dev/null; then echo "Comparisons: PASS" rm -f $input $ocamlOut $photonOut else echo "Comparisons: FAILED!" fi
comparisons.ml
itself is rather simple. It chooses randomly one type, then it generates two random elements of that type and chooses randomly one operator supported by that type. It then writes the comparisons operand1 operator operand2;;
to stdout
. The program generates as many comparisons as requested in its first argument (if no argument is provided, 1000 comparisons are generated). We do not copy the code here, see source code for details.
One limitation of our approach is that there is neither a correlation between the provided parameter and the size of generated elements nor limit test cases. We will do with it for now. The lack of overloading would make an OCaml QuickCheck implementation very unfriendly, but we may add limit test cases manually later (if you do, send them to devmusings@legiasoft.com).
Running the tests, we discovered that we were not handling character comparisons properly. Our initial version was using the same comparisons for char
as for int
, causing the signedness problem we explained in compiler.
Our interpreter is now equipped with basic comparison operators. However, these are of little use in the absence of control flow instructions. We will add an if-then-else construct in the next installment, which will allow us to finally define interesting recursive functions.
The code accompanying this article is available in archive photon-tut-11.tar.xz or through the git repository:
git clone http://git.legiasoft.com/photon.git cd photon git checkout 11-comparison_operators
Discussion