User Tools

Site Tools


blog:2011:02:03:comparison_operators

Photon Compiler Development: Comparison Operators

This is the 11th article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.

In this article, we add comparison operators to our language.

Parsing

We add tokens for lower (<), lower or equal (<=), greater (>), greater or equal (>=) and not equal (<>). We reuse equal (=) for structural equality and we ignore physical equality for now.

We add constructors for these operators to our bin_op type.

type bin_op =
     Cmp_eq (** [=] *)
   | Cmp_ge (** [>=] *)
   | Cmp_gt (** [>] *)
   | Cmp_le (** [<=] *)
   | Cmp_lt (** [<] *)
   | Cmp_ne (** [<>] *)
   ...

These operators binds less strongly than arithmetic operators meaning that a + b = c + d is interpreted as (a + b) = (c + d). We put all the comparison operators on the same level. We make them left-associative. It is a bit dubious but that is the way OCaml handles them. Wrong associativity assumptions will more often than not be reported as type errors anyway.

%left LOWER LOWER_EQUAL EQUAL NOT_EQUAL GREATER_EQUAL GREATER
%left PLUS PLUS_DOT MINUS MINUS_DOT
...

Finally, we add expr rules for the newly defined operators.

expr
   ...
   | expr EQUAL expr       { label 0 (Bin_op($1, Cmp_eq, $3)) }
   | expr GREATER expr     { label 0 (Bin_op($1, Cmp_gt, $3)) }
   | expr GREATER_EQUAL expr { label 0 (Bin_op($1, Cmp_ge, $3)) }
   | expr LOWER expr       { label 0 (Bin_op($1, Cmp_lt, $3)) }
   | expr LOWER_EQUAL expr { label 0 (Bin_op($1, Cmp_le, $3)) }
   | expr NOT_EQUAL expr   { label 0 (Bin_op($1, Cmp_ne, $3)) }

Typing

Comparisons operators are polymorphic in OCaml, i.e. they work on any type. Operands to a comparison are only required to have the same type. The resulting type of the comparison is of course bool.

In Photon, we will be a little more demanding, ruling out more invalid comparisons during typing at compile type. In OCaml, one can compile a program where two functions are compared. The program compiles fine but throw an exception at runtime:

Fatal error: exception Invalid_argument("equal: functional value")

The same program in Photon gives the following error message at compile time:

<stdin>:1.1: cannot perform equality tests on type int -> int

More specifically, we distinguish:

  • ordered types, i.e. types for which the notion of order has a meaning and which support all comparison operators: char, float, int and string (lexicographic order);
  • types that supports only equality tests, but for which there is no notion of order: bool1) and unit;
  • types that do not even support equality tests, either because it has no meaning or because it is simply not implemented. At the moment, functions fall in this category.

We use the following two helpers to check type comparability. Each one throws a newly defined error in case the given type does not support the requested operation. Their names should be familiar to Haskell programmers2) ;-)

(* [assert_is_eq ast] asserts that [ast] has a type suitable for equality
 * tests. *)
let assert_is_eq ast =
   match type_of ast with
     (Fun(_, _) | Ref _ ) as t ->
        raise (Error(pos_of ast, Not_eq t))
   | (Bool | Char | Float | Int | String | Unit) -> ()
 
(* [assert_is_ord ast] asserts that [ast] has an ordered type. *)
let assert_is_ord ast =
   match type_of ast with
     (Bool | Fun(_, _) | Ref _ | Unit) as t ->
        raise (Error(pos_of ast, Not_ord t))
   | (Char | Float | Int | String) -> ()

The type annotation of binary operations is now as follows.

let rec annotate_ast type_env ast =
     match ast.node with
       Bin_op(lhs, op, rhs) ->
          let lhs = annotate_ast type_env lhs in
          let rhs = annotate_ast type_env rhs in
          let ret_type =
             match op with
               (Cmp_eq | Cmp_ne ) ->
                  assert_is_eq lhs;
                  Type.Bool
             | (F_add | F_div | F_mul | F_pow | F_sub) ->
                  assert_type lhs Type.Float;
                  Type.Float
             | (I_add | I_div | I_mul | I_sub) ->
                  assert_type lhs Type.Int;
                  Type.Int
             | (Cmp_ge | Cmp_gt | Cmp_le | Cmp_lt) ->
                  assert_is_ord lhs;
                  Type.Bool in
          assert_type rhs (type_of lhs);
          { label = (ast.label, ret_type); node = Bin_op(lhs, op, rhs) }
      ...

Compiler

To keep compile short, we delegate the compilation of binary operations to a separate function compile_bin_op.

let rec compile comp ast =
   match ast.node with
     Bin_op(lhs, op, rhs) ->
        let t = type_of_ast lhs in
        let lhs = compile comp lhs in
        let rhs = compile comp rhs in
        compile_bin_op comp op t lhs rhs
   ...

compile_bin_op compiles arithmetic operations as given previously, and itself delegates the compilation of comparisons to a compile_comparison function.

let compile_bin_op comp op t lhs rhs =
   match op with
     (Cmp_eq | Cmp_ge | Cmp_gt | Cmp_le | Cmp_lt | Cmp_ne) ->
        compile_comparison comp op t lhs rhs
   | F_add -> build_fadd lhs rhs "sum" comp.builder
   ...

Compiling comparisons proceeds as follows. We first match on type, and then on the operator. Types supporting only equality tests only match Cmp_eq and Cmp_ne while ordered types match all Cmp_* operators.

Integer type (bool, char, int) comparisons are evaluated by the icmp instruction. Our first idea was to do the exact same comparisons for all integer types. We however changed our mind for the following reasons:

  • Although order comparisons on bool should have been ruled out by the Typer, we prefer to keep it separated in the Compiler as a redundant check.
  • Our tests have demonstrated that char comparisons should be unsigned. LLVM integer types have no signedness. The operations specify how to interpret the types (signed division sdiv vs unsigned division udiv, icmp uge vs icmp sge, …). While our integers are signed and should be compared with the signed operators, we expect '\000' to be lower than '\255' so characters should be compared unsigned.

We thus have three separate cases for bool, char and int, although these are very similar.

let rec compile_comparison comp op t lhs rhs =
   match t with
     Bool ->
        begin match op with
          Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder
        | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder
        | _ -> failwith "compile_comparison: Bool"
        end
   | Char ->
        begin match op with
          Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder
        | Cmp_ge -> build_icmp Icmp.Uge lhs rhs "ige" comp.builder
        | Cmp_gt -> build_icmp Icmp.Ugt lhs rhs "igt" comp.builder
        | Cmp_le -> build_icmp Icmp.Ule lhs rhs "ile" comp.builder
        | Cmp_lt -> build_icmp Icmp.Ult lhs rhs "ilt" comp.builder
        | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder
        | _ -> failwith "compile_comparison: (Bool | Char | Int)"
        end
   | Int ->
        begin match op with
          Cmp_eq -> build_icmp Icmp.Eq lhs rhs "ieq" comp.builder
        | Cmp_ge -> build_icmp Icmp.Sge lhs rhs "ige" comp.builder
        | Cmp_gt -> build_icmp Icmp.Sgt lhs rhs "igt" comp.builder
        | Cmp_le -> build_icmp Icmp.Sle lhs rhs "ile" comp.builder
        | Cmp_lt -> build_icmp Icmp.Slt lhs rhs "ilt" comp.builder
        | Cmp_ne -> build_icmp Icmp.Ne lhs rhs "ine" comp.builder
        | _ -> failwith "compile_comparison: (Bool | Char | Int)"
        end

Floating-point number comparisons are very similar to integer comparisons, except that the fcmp instruction is used in place of icmp. There is no signedness considerations with fcmp, but we have to choose between what LLVM calls ordered and unordered modes. Ordered mode rules out not-a-number operands and it is the one we chose.

   | Float ->
        begin match op with
          Cmp_eq -> build_fcmp Fcmp.Oeq lhs rhs "feq" comp.builder
        | Cmp_ge -> build_fcmp Fcmp.Oge lhs rhs "fge" comp.builder
        | Cmp_gt -> build_fcmp Fcmp.Ogt lhs rhs "fgt" comp.builder
        | Cmp_le -> build_fcmp Fcmp.Ole lhs rhs "fle" comp.builder
        | Cmp_lt -> build_fcmp Fcmp.Olt lhs rhs "flt" comp.builder
        | Cmp_ne -> build_fcmp Fcmp.One lhs rhs "fne" comp.builder
        | _ -> failwith "compile_comparison: Float"
        end

Unit comparisons requires no instruction at all. The unit type containing the single element (), it is always equal to itself. We thus always return true for = and false for <>. One may wonder why we would like to compare units. It may arise when using generic code to be introduced later.

   | Unit ->
        begin match op with
          Cmp_eq -> const_true
        | Cmp_ne -> const_false
        | _ -> failwith "compile_comparison: Unit"
        end

Finally, strings are compared lexicographically by first calling Ord_string_compare. This function compares the two given strings. It returns an integer less than, equal to, or greater than zero if the first string is found, respectively, to be less than, to match, or be greater than the second one.

The obtained number is then compared to zero to determine the result of the comparison. Ord_string_compare is defined in create as an alias to strcmp.

   | String ->
        let cmp = Hashtbl.find comp.globals "Ord_compare_string" in
        let i = build_call cmp [|lhs; rhs|] "cmp" comp.builder in
        compile_comparison comp op Int i (const_int int_type 0)
   | _ -> failwith "compile_comparison: unsupported type"

Tests

We only casually tested our compiler through interactive interpreter sessions up to now. Our compiler has reached a state where we can no longer avoid to test it, at least a little bit. We need to gain a little confidence that our compiler does at least some things right and we need to be able to check that future modifications will not break what used to work in previous versions.

We thus add a tests directory to our source tree. As Photon is a subset of OCaml, we intend to do most tests by comparing the results of our interpreter (or compiled program) to the results of the OCaml interpreter (or program compiled with the OCaml compiler).

The idea is as follows:

  1. dumps output of Photon interpreter into a file;
  2. dumps output of OCaml interpreter into another file;
  3. compare the two files: they should be equivalent.

For comparisons, the two output files will be equal if we remove the version notice in the first two lines of OCaml interpreter output.

Rather than loading a comparisons.ml file directly, with hard-coded comparisons which would be boring to write and would not cover many cases, we will do it in a QuickCheck-like way. We write a comparison generator in comparisons.ml and use the generator to feed us with test cases in comparisons.input.

input=comparisons.input
ocamlOut=comparisons.ocaml.out
photonOut=comparisons.photon.out
rm -f $input $ocamlOut $photonOut
ocaml comparisons.ml $n > $input
ocaml < $input | tail -n +3 > $ocamlOut
../photon.native < $input > $photonOut
if diff $photonOut $ocamlOut >/dev/null; then
   echo "Comparisons: PASS"
   rm -f $input $ocamlOut $photonOut
else
   echo "Comparisons: FAILED!"
fi

comparisons.ml itself is rather simple. It chooses randomly one type, then it generates two random elements of that type and chooses randomly one operator supported by that type. It then writes the comparisons operand1 operator operand2;; to stdout. The program generates as many comparisons as requested in its first argument (if no argument is provided, 1000 comparisons are generated). We do not copy the code here, see source code for details.

One limitation of our approach is that there is neither a correlation between the provided parameter and the size of generated elements nor limit test cases. We will do with it for now. The lack of overloading would make an OCaml QuickCheck implementation very unfriendly, but we may add limit test cases manually later (if you do, send them to devmusings@legiasoft.com).

Running the tests, we discovered that we were not handling character comparisons properly. Our initial version was using the same comparisons for char as for int, causing the signedness problem we explained in compiler.

Conclusion

Our interpreter is now equipped with basic comparison operators. However, these are of little use in the absence of control flow instructions. We will add an if-then-else construct in the next installment, which will allow us to finally define interesting recursive functions.

Source Code

The code accompanying this article is available in archive photon-tut-11.tar.xz or through the git repository:

git clone http://git.legiasoft.com/photon.git
cd photon
git checkout 11-comparison_operators
1) One can order booleans, but it is not obvious which order to choose, so we ruled out boolean ordering altogether.
2) Their names recall the Haskell Eq and Ord type classes.

Discussion

Enter your comment. Wiki syntax is allowed:
 
blog/2011/02/03/comparison_operators.txt · Last modified: 2011/02/20 20:28 (external edit)