This is the nineth article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.
In this article, we de-clutter the output by adding command line arguments and top-level directives to control the desired level of debugging information. We also make some cosmetic changes.
We will handle the following command-line arguments:
Argument | Meaning |
---|---|
-dast | Dump AST after each top-level phrase |
-dir | Dump IR after each top-level phrase |
-dtypes | Dump type after each top-level phrase |
-help | Display the list of option |
Dumps will occur as soon as possible in order to be sure they are printed, even if an exception is thrown in a later compilation phase. AST are dumped directly after parsing, types directly after typing and IR directly after compilation (before execution).
We handle command-line arguments using the Arg
module from the OCaml standard library.
We begin by defining three mutable flags for the three dump states.
let should_dump_ast = ref false let should_dump_type = ref false let should_dump_ir = ref false
We then specify our arguments. Set
means that the following flag should be set when the corresponding argument is met. -help
will be built automatically by the library.
let args_spec = [ "-dast", Set should_dump_ast, "dump AST after each top-level phrase"; "-dtype", Set should_dump_type, "dump type after each top-level phrase"; "-dir", Set should_dump_ir, "dump IR after each top-level phrase"; ]
If we meet an unknown argument, we throw the predefined Arg.Bad
exception to reject it.
let bad_argument arg = raise (Arg.Bad ("invalid argument '" ^ arg ^ "'"))
We begin main
by parsing arguments.
let main () = Arg.parse args_spec bad_argument "photon [OPTIONS]"; ...
In every place we dumped debugging information before, we now check that the corresponding dump flag is set first. For example,
let top_level = Parser.interactive Lexer.token lexbuf in if !should_dump_ast then print_endline (string_of_top_level top_level);
Rather than having to set debugging flags once for all at interpreter start-up, it is convenient to be able to enable or disable them during an interactive session. It is also more convenient if we are not obliged to exit the interpreter to dump the content of the LLVM module.
We handle this through OCaml-like directives, i.e. commands destined to the interactive interpreter. As in OCaml, we begin these directives by #
, but our directives are different (except for #quit
).
We support the following directives:
Directive | Meaning |
---|---|
dump_module | Dumps current LLVM module |
quit | Exits the interpreter |
enable FLAG | Enables the given debugging facility |
disable FLAG | Disables the given debugging facility |
Possible debugging facilities are:
dump_ast
which sets/clears should_dump_ast
;dump_ir
which sets/clears should_dump_ir
;dump_type
which sets/clears should_dump_type
.
To implement the directives, we introduce a new #
token and a new Directive
constructor in top-level phrases. A directive contains the directive string, its position and an optional string argument (also with position).
type 'a top_level = ... | Directive of Position.t * string * (Position.t * string) option
The parser is adapted accordingly.
interactive ... | directive SEMI_SEMI { $1 } directive : SHARP IDENT { Directive(pos 2, $2, None) } | SHARP IDENT IDENT { Directive(pos 2, $2, Some (pos 3, $3)) }
Finally, the driver code is adapted to also handle directives. In case of error, we simply prints an error message on standard error, the possible remaining input will be discarded when the REPL resumes.
(* [handle_directive comp type_env pos dir arg] execute interactive top-level * directive [dir] occuring at position [pos]. [comp] is the current module * compiler and [type_env] current type environment. *) let handle_directive comp type_env pos dir arg = match dir with "dump_module" -> Llvm.dump_module (Compiler.llvm_module comp) | "quit" -> raise End_of_file | ("enable" | "disable") -> begin match arg with None -> let pos = string_of_pos pos in prerr_endline (pos ^ ": set directive expects an argument") | Some (pos, arg) -> let b = dir = "enable" in match arg with "dump_ast" -> should_dump_ast := b | "dump_ir" -> should_dump_ir := b | "dump_type" -> should_dump_type := b | _ -> let pos = string_of_pos pos in prerr_endline (pos ^ ": unknown action '" ^ arg ^ "'") end | _ -> let pos = string_of_pos pos in prerr_endline (pos ^ ": unknown directive '" ^ dir ^ "'") let main () = ... try while true do ... try ... match top_level with ... | Directive(pos, dir, arg) -> handle_directive comp type_env pos dir arg ...
We defined int
size to be 32 bits on 32-bit machines and 64 bits on 64-bit machines. However, we also assumed a correspondence between Photon int
and C int
.This correspondence does not necessarily holds, and often does not, as C int
is often only 32-bit long on 64-bit machines (see 64-bit and Data-Size Neutrality).
We changed the definition of int
to match C int
size. We used a new C binding in llvm_utils.ml
and llvm_fix.c
.
(** Size of C [int] type in bits. *) external c_int_size : unit -> int = "c_int_size" (** Integer type with the same size as a C [int] on the same machine. *) let int_type = match c_int_size () with 32 -> i32_type | 64 -> i64_type | n -> failwith ("int_type: unsupported C int size " ^ string_of_int n)
/* Returns the number of bits in a C int. */ CAMLprim value c_int_size(value unit) { return Val_int(sizeof (int) * 8); }
We can thus freely use int
in type signatures for external C functions taking C int
arguments or returning C int
.
We could have defined new int32
and int64
types for external function type signatures, but our bindings would then have become machine dependent. For example, a binding to a function int f(int)
would have been external f : int32 → int32
or external f : int64 → int64
depending on the context. It would also have implied more type conversions between Photon and C code. We may change our mind in the future, however.
We used the LLVM function type to denote the type of functions. However, functions are always referred to through a pointer. The llvalue
returned by declare_function
, define_function
or lookup_function
is a function pointer.
Moreover, functions are not first-class in LLVM which is, as its name implies, low-level. We cannot pass one function as argument or return one.
We accordingly changed our Fun
type representation to be a function pointer.
The single quote was not handled by our print_escaped_char
function. This resulted in '\''
being printed as '''
which is not a valid character.
We fixed this and also improved the printing routine to avoid escaping single or double quotes when it is not necessary (depending if they are printed as a single character or as part of a character string).
In addition to the aforementioned changes, we also did a bit of re-factoring. The general idea was to improve legibility.
Main changes:
Photon
to Typer
. Typer
now works on top-level phrases and is responsible for updating the type environment. It returns the inferred type for convenience to Photon
which still dumps it (if requested).builder
into Compiler.t
. It prevents parallel compilations in the same module, however:llmodule
would be thread-safe anyway;Compiler.t
is nearly always needed in addition to a llbuilder
. Fusing the two lightens the code.ocaml
one.Printable_print_<type>
. Keeping the print_<type>
idiom would have clashed with standard library (e.g. print_string
should print its argument without enclosing double-quotes and without escaping characters).add
and find
respectively to add_var
and find_var
in Type_env
to avoid confusion with add_type
and find_type
.true
and false
constants, rather than hard-coding them.Our interpreter is now much less cluttered, both in its output and its source code.
In the next installment, we will provide for function definitions.
The code accompanying this article is available in archive photon-tut-09.tar.xz or through the git repository:
git clone http://git.legiasoft.com/photon.git cd photon git checkout 9-controlling_output.1
Discussion