This is the tenth article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.
In this article, we add function definitions to our language.
We adds a new top-level Bind_fun
to record function bindings. The approach may be generalized later when we will introduce anonymous functions but it will do for now. A function bindings records whether or not it is recursive, the function name, a list of arguments, the return type and function body.
Arguments are pairs of an identifier and a type.
In this first version, we will only accepts fully type-annotated function definitions. We will add proper type inference later. We also don't accept patterns as arguments, they should be identifiers (we will not implement patterns in function arguments in our OCaml subset).
type arg = string * Type.t type 'a top_level = Bind_fun of bool * string * arg list * Type.t * 'a t | Bind_val of string * 'a t ...
The parser is updated as follows.
%token REC interactive ... | LET IDENT args COLON type_ EQUAL expr SEMI_SEMI { Bind_fun(false, $2, List.rev $3, $5, $7) } | LET REC IDENT args COLON type_ EQUAL expr SEMI_SEMI { Bind_fun(true, $3, List.rev $4, $6, $8) } args : arg { [$1] } | args arg { $2 :: $1 } arg : LPAR IDENT COLON type_ RPAR { ($2, $4) }
In the typing of function bindings, the arguments (and, if it is recursive, the function itself) must be in the type environment when typing the body. However, arguments should not be left permanently in the type environment.
We see four possible solutions:
annotate_ast
. When typing a variable binding, we look first in the argument type environment and only if not found in the global type environment. This solution augments the number of arguments to pass around and we must be careful to look-up variables in the right type environment. Moreover, this solution does not scale. How could we handle inner functions or let-bindings with this scheme?Hashtbl.t
representation is well-suited to such an usage. Hashtbl.add
does not erase previous binding which can be recovered after a call to Hashtbl.remove
. However, we would need to be very careful to restore the environment even in the case of errors. This would require us to catch all exceptions only to re-throw them after having restored the environment.
We chose the fourth approach and re-implemented Type_env
with Map
instead of Hashtbl
. OCaml Map
is implemented with a functor (i.e. a parametric module taking other modules as arguments). We obtain a string map with the following code.
module StringMap = Map.Make(String)
We will not support functors in Photon. We will thus need to update this part of the code to make our compiler self-compiling (at worst, we could use an association list).
To avoid threading the current global type environment around, we define it with a mutable reference in the driver. annotate
will update this reference directly with persistent bindings.
Equipped with our new type environment, the typing of function bindings proceeds as follows. We replace type references by concrete types in argument types and return type. We then type the body in an extended type environment. We then check the body type matches the provided return type. We could have better error messages by typing the body with the expected return type in mind, but we will consider this issue when implementing type inference. Finally, we update the type environment with the function binding.
let annotate type_env = function Bind_fun(is_rec, id, args, ret_type, body) -> (* Replaces refs *) let replace_arg (id, t) = let t = replace_refs !type_env t in (id, t) in let args = List.map replace_arg args in let ret_type = replace_refs !type_env ret_type in let t = Type.Fun(ret_type, List.map snd args) in (* Builds updated type environment *) let add_arg type_env (id, t) = Type_env.add_var type_env id t in let ext_type_env = List.fold_left add_arg !type_env args in let ext_type_env = if is_rec then Type_env.add_var ext_type_env id t else ext_type_env in (* Type body *) let body = annotate_ast ext_type_env body in assert_type body ret_type; type_env := Type_env.add_var !type_env id t; (t, Bind_fun(is_rec, id, args, ret_type, body)) ...
The problem of temporary, local bindings also appears when we compile a function. We need to be able to access arguments in and only in our function body.
If the problem is the same as for typing, the context is, however, different. Here, we will have no inner functions and the compilation process should never fail, if we implemented it properly.
We chose to store arguments separately into an auxiliary hash table in Compiler.t
. This table is populated with arguments before typing function body and is cleared once it is done. Variable look-up first tries the argument table, and only falls back to the global table if the variable is not found in argument table.
We define a new function define_fun
which takes a Compiler.t
, a flag telling whether or not the definition is recursive, the function name, its arguments, its full type (i.e. a function type, not just the return type) and function body AST.
We first declare the function. If it is recursive, we also add it to globals.
We then setup arguments. All arguments are pushed in the argument hash table. Non-unit arguments refer to the corresponding LLVM function parameter while unit arguments refer directly to the unit constant. We set LLVM function argument names with the corresponding Photon names.
Once arguments are ready, we compile the function body, returning its value as the function return value (if any).
Finally, we empty the argument hash table and add our function to globals if it is not already done (i.e. if it is not recursive).
let define_fun comp is_rec id args t body = (* Declare function *) let t = element_type (llvm_type_of t) in let f = define_function id t comp.module_ in if is_rec then Hashtbl.add comp.globals id f; (* Set argument names and put them in scope *) let (unit_args, non_unit_args) = List.partition (fun (_, t) -> t = Unit) args in let non_unit_args = Array.of_list (List.map fst non_unit_args) in let unit_args = List.map fst unit_args in let add_arg i p = set_value_name non_unit_args.(i) p; Hashtbl.add comp.args non_unit_args.(i) p in Array.iteri add_arg (params f); let u = unsafe_lookup_global ".unit" comp.module_ in List.iter (fun id -> Hashtbl.add comp.args id u) unit_args; (* Compile body *) position_at_end (entry_block f) comp.builder; let value = compile comp body in begin match classify_type (return_type t) with TypeKind.Void -> ignore (build_ret_void comp.builder) | _ -> ignore (build_ret value comp.builder) end; (* Clear arguments and add function if not already done *) Hashtbl.clear comp.args; if not is_rec then Hashtbl.add comp.globals id f; (* Return *) assert_valid_function f; f
We also modify variable look-up to check arguments first.
let lookup_var comp id = try Hashtbl.find comp.args id with Not_found -> let g = Hashtbl.find comp.globals id in if is_function g then g else if is_global_constant g then global_initializer g else build_load g id comp.builder
We can now define simple functions directly in Photon.
# let inc (x : int) : int = x + 1;; val inc : int -> int = <fun> # let twice (f : int -> int) (x : int) : int = f (f x);; val twice : (int -> int) -> int -> int = <fun> # twice inc 40;; - : int = 42 # twice inc;; <stdin>:1.1-5: function is given 1 arguments but expects 2 arguments
However, as the last phrase of our example show, function are still far from being first-class (no partial application yet). The absence of type inference is also a hindrance.
Before turning our attention to these interesting topics, we however have more urgent needs. How would you define any interesting and terminating recursive function with the language defined so far? In the next installments, we will add comparison operators and basic control flow to the language.
The code accompanying this article is available in archive photon-tut-10.tar.xz or through the git repository:
git clone http://git.legiasoft.com/photon.git cd photon git checkout 10-function_definitions
Discussion