User Tools

Site Tools


blog:2011:02:02:function_definitions

Photon Compiler Development: Function Definitions

This is the tenth article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.

In this article, we add function definitions to our language.

Parsing

We adds a new top-level Bind_fun to record function bindings. The approach may be generalized later when we will introduce anonymous functions but it will do for now. A function bindings records whether or not it is recursive, the function name, a list of arguments, the return type and function body.

Arguments are pairs of an identifier and a type.

In this first version, we will only accepts fully type-annotated function definitions. We will add proper type inference later. We also don't accept patterns as arguments, they should be identifiers (we will not implement patterns in function arguments in our OCaml subset).

type arg = string * Type.t
 
type 'a top_level =
     Bind_fun of bool * string * arg list * Type.t * 'a t
   | Bind_val of string * 'a t
   ...

The parser is updated as follows.

%token REC
 
interactive
   ...
   | LET IDENT args COLON type_ EQUAL expr SEMI_SEMI
                           { Bind_fun(false, $2, List.rev $3, $5, $7) }
   | LET REC IDENT args COLON type_ EQUAL expr SEMI_SEMI
                           { Bind_fun(true, $3, List.rev $4, $6, $8) }
 
args
   : arg                   { [$1] }
   | args arg              { $2 :: $1 }
 
arg
   : LPAR IDENT COLON type_ RPAR { ($2, $4) }

Typing

In the typing of function bindings, the arguments (and, if it is recursive, the function itself) must be in the type environment when typing the body. However, arguments should not be left permanently in the type environment.

We see four possible solutions:

  • Add a second type environment with arguments to calls to annotate_ast. When typing a variable binding, we look first in the argument type environment and only if not found in the global type environment. This solution augments the number of arguments to pass around and we must be careful to look-up variables in the right type environment. Moreover, this solution does not scale. How could we handle inner functions or let-bindings with this scheme?
  • Push arguments in the type environment before typing the body, and pops them afterward. The underlying Hashtbl.t representation is well-suited to such an usage. Hashtbl.add does not erase previous binding which can be recovered after a call to Hashtbl.remove. However, we would need to be very careful to restore the environment even in the case of errors. This would require us to catch all exceptions only to re-throw them after having restored the environment.
  • Do not add arguments to the provided type environment. Add them to a copy of the type environment and use this extended copy to type the body. This solution is simple but requires copying a potentially large type environment each time a function is typed (to look-up only a handful of variables).
  • The functional approach: use an immutable type environment. Adding a binding returns a new type environment, letting the old type environment untouched.

We chose the fourth approach and re-implemented Type_env with Map instead of Hashtbl. OCaml Map is implemented with a functor (i.e. a parametric module taking other modules as arguments). We obtain a string map with the following code.

module StringMap = Map.Make(String)

We will not support functors in Photon. We will thus need to update this part of the code to make our compiler self-compiling (at worst, we could use an association list).

To avoid threading the current global type environment around, we define it with a mutable reference in the driver. annotate will update this reference directly with persistent bindings.

Equipped with our new type environment, the typing of function bindings proceeds as follows. We replace type references by concrete types in argument types and return type. We then type the body in an extended type environment. We then check the body type matches the provided return type. We could have better error messages by typing the body with the expected return type in mind, but we will consider this issue when implementing type inference. Finally, we update the type environment with the function binding.

let annotate type_env = function
     Bind_fun(is_rec, id, args, ret_type, body) ->
        (* Replaces refs *)
        let replace_arg (id, t) =
           let t = replace_refs !type_env t in
           (id, t) in
        let args = List.map replace_arg args in
        let ret_type = replace_refs !type_env ret_type in
        let t = Type.Fun(ret_type, List.map snd args) in
        (* Builds updated type environment *)
        let add_arg type_env (id, t) = Type_env.add_var type_env id t in
        let ext_type_env = List.fold_left add_arg !type_env args in
        let ext_type_env =
           if is_rec then Type_env.add_var ext_type_env id t
           else ext_type_env in
        (* Type body *)
        let body = annotate_ast ext_type_env body in
        assert_type body ret_type;
        type_env := Type_env.add_var !type_env id t;
        (t, Bind_fun(is_rec, id, args, ret_type, body))
   ...

Compiler

The problem of temporary, local bindings also appears when we compile a function. We need to be able to access arguments in and only in our function body.

If the problem is the same as for typing, the context is, however, different. Here, we will have no inner functions and the compilation process should never fail, if we implemented it properly.

We chose to store arguments separately into an auxiliary hash table in Compiler.t. This table is populated with arguments before typing function body and is cleared once it is done. Variable look-up first tries the argument table, and only falls back to the global table if the variable is not found in argument table.

We define a new function define_fun which takes a Compiler.t, a flag telling whether or not the definition is recursive, the function name, its arguments, its full type (i.e. a function type, not just the return type) and function body AST.

We first declare the function. If it is recursive, we also add it to globals.

We then setup arguments. All arguments are pushed in the argument hash table. Non-unit arguments refer to the corresponding LLVM function parameter while unit arguments refer directly to the unit constant. We set LLVM function argument names with the corresponding Photon names.

Once arguments are ready, we compile the function body, returning its value as the function return value (if any).

Finally, we empty the argument hash table and add our function to globals if it is not already done (i.e. if it is not recursive).

let define_fun comp is_rec id args t body =
   (* Declare function *)
   let t = element_type (llvm_type_of t) in
   let f = define_function id t comp.module_ in
   if is_rec then Hashtbl.add comp.globals id f;
   (* Set argument names and put them in scope *)
   let (unit_args, non_unit_args) =
      List.partition (fun (_, t) -> t = Unit) args in
   let non_unit_args = Array.of_list (List.map fst non_unit_args) in
   let unit_args = List.map fst unit_args in
   let add_arg i p =
      set_value_name non_unit_args.(i) p;
      Hashtbl.add comp.args non_unit_args.(i) p in
   Array.iteri add_arg (params f);
   let u = unsafe_lookup_global ".unit" comp.module_ in
   List.iter (fun id -> Hashtbl.add comp.args id u) unit_args;
   (* Compile body *)
   position_at_end (entry_block f) comp.builder;
   let value = compile comp body in
   begin match classify_type (return_type t) with
     TypeKind.Void -> ignore (build_ret_void comp.builder)
   | _ -> ignore (build_ret value comp.builder)
   end;
   (* Clear arguments and add function if not already done *)
   Hashtbl.clear comp.args;
   if not is_rec then Hashtbl.add comp.globals id f;
   (* Return *)
   assert_valid_function f;
   f

We also modify variable look-up to check arguments first.

let lookup_var comp id =
   try Hashtbl.find comp.args id
   with Not_found ->
      let g = Hashtbl.find comp.globals id in
      if is_function g then g
      else
         if is_global_constant g then global_initializer g
         else build_load g id comp.builder

Conclusion

We can now define simple functions directly in Photon.

# let inc (x : int) : int = x + 1;;
val inc : int -> int = <fun>
# let twice (f : int -> int) (x : int) : int = f (f x);;
val twice : (int -> int) -> int -> int = <fun>
# twice inc 40;;
- : int = 42
# twice inc;;
<stdin>:1.1-5: function is given 1 arguments but expects 2 arguments

However, as the last phrase of our example show, function are still far from being first-class (no partial application yet). The absence of type inference is also a hindrance.

Before turning our attention to these interesting topics, we however have more urgent needs. How would you define any interesting and terminating recursive function with the language defined so far? In the next installments, we will add comparison operators and basic control flow to the language.

Source Code

The code accompanying this article is available in archive photon-tut-10.tar.xz or through the git repository:

git clone http://git.legiasoft.com/photon.git
cd photon
git checkout 10-function_definitions

Discussion

Enter your comment. Wiki syntax is allowed:
 
blog/2011/02/02/function_definitions.txt · Last modified: 2012/01/26 21:19 by csoldani