User Tools

Site Tools


blog:2011:01:31:controlling_output

Photon Compiler Development: Controlling Output

This is the nineth article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.

In this article, we de-clutter the output by adding command line arguments and top-level directives to control the desired level of debugging information. We also make some cosmetic changes.

Command-Line Arguments

We will handle the following command-line arguments:

Argument Meaning
-dast Dump AST after each top-level phrase
-dir Dump IR after each top-level phrase
-dtypes Dump type after each top-level phrase
-help Display the list of option

Dumps will occur as soon as possible in order to be sure they are printed, even if an exception is thrown in a later compilation phase. AST are dumped directly after parsing, types directly after typing and IR directly after compilation (before execution).

We handle command-line arguments using the Arg module from the OCaml standard library.

We begin by defining three mutable flags for the three dump states.

let should_dump_ast = ref false
 
let should_dump_type = ref false
 
let should_dump_ir = ref false

We then specify our arguments. Set means that the following flag should be set when the corresponding argument is met. -help will be built automatically by the library.

let args_spec = [
   "-dast", Set should_dump_ast, "dump AST after each top-level phrase";
   "-dtype", Set should_dump_type, "dump type after each top-level phrase";
   "-dir", Set should_dump_ir, "dump IR after each top-level phrase";
]

If we meet an unknown argument, we throw the predefined Arg.Bad exception to reject it.

let bad_argument arg = raise (Arg.Bad ("invalid argument '" ^ arg ^ "'"))

We begin main by parsing arguments.

let main () =
   Arg.parse args_spec bad_argument "photon [OPTIONS]";
   ...

In every place we dumped debugging information before, we now check that the corresponding dump flag is set first. For example,

            let top_level = Parser.interactive Lexer.token lexbuf in
            if !should_dump_ast then
               print_endline (string_of_top_level top_level);

Directives

Rather than having to set debugging flags once for all at interpreter start-up, it is convenient to be able to enable or disable them during an interactive session. It is also more convenient if we are not obliged to exit the interpreter to dump the content of the LLVM module.

We handle this through OCaml-like directives, i.e. commands destined to the interactive interpreter. As in OCaml, we begin these directives by #, but our directives are different (except for #quit).

We support the following directives:

Directive Meaning
dump_module Dumps current LLVM module
quit Exits the interpreter
enable FLAG Enables the given debugging facility
disable FLAG Disables the given debugging facility

Possible debugging facilities are:

  • dump_ast which sets/clears should_dump_ast;
  • dump_ir which sets/clears should_dump_ir;
  • dump_type which sets/clears should_dump_type.

To implement the directives, we introduce a new # token and a new Directive constructor in top-level phrases. A directive contains the directive string, its position and an optional string argument (also with position).

type 'a top_level =
     ...
   | Directive of Position.t * string * (Position.t * string) option

The parser is adapted accordingly.

interactive
   ...
   | directive SEMI_SEMI   { $1 }
 
directive
   : SHARP IDENT           { Directive(pos 2, $2, None) }
   | SHARP IDENT IDENT     { Directive(pos 2, $2, Some (pos 3, $3)) }

Finally, the driver code is adapted to also handle directives. In case of error, we simply prints an error message on standard error, the possible remaining input will be discarded when the REPL resumes.

(* [handle_directive comp type_env pos dir arg] execute interactive top-level
 * directive [dir] occuring at position [pos]. [comp] is the current module
 * compiler and [type_env] current type environment. *)
let handle_directive comp type_env pos dir arg =
   match dir with
     "dump_module" -> Llvm.dump_module (Compiler.llvm_module comp)
   | "quit" -> raise End_of_file
   | ("enable" | "disable") ->
        begin match arg with
          None ->
             let pos = string_of_pos pos in
             prerr_endline (pos ^ ": set directive expects an argument")
        | Some (pos, arg) ->
             let b = dir = "enable" in
             match arg with
               "dump_ast" -> should_dump_ast := b
             | "dump_ir" -> should_dump_ir := b
             | "dump_type" -> should_dump_type := b
             | _ ->
                  let pos = string_of_pos pos in
                  prerr_endline (pos ^ ": unknown action '" ^ arg ^ "'")
        end
   | _ ->
        let pos = string_of_pos pos in
        prerr_endline (pos ^ ": unknown directive '" ^ dir ^ "'")
 
let main () =
   ...
   try
      while true do
         ...
         try
            ...
            match top_level with
              ...
            | Directive(pos, dir, arg) ->
                 handle_directive comp type_env pos dir arg
            ...

Bug Fixes

Integer Size

We defined int size to be 32 bits on 32-bit machines and 64 bits on 64-bit machines. However, we also assumed a correspondence between Photon int and C int.This correspondence does not necessarily holds, and often does not, as C int is often only 32-bit long on 64-bit machines (see 64-bit and Data-Size Neutrality).

We changed the definition of int to match C int size. We used a new C binding in llvm_utils.ml and llvm_fix.c.

(** Size of C [int] type in bits. *)
external c_int_size : unit -> int = "c_int_size"
 
(** Integer type with the same size as a C [int] on the same machine. *)
let int_type =
   match c_int_size () with
     32 -> i32_type
   | 64 -> i64_type
   | n -> failwith ("int_type: unsupported C int size " ^ string_of_int n)
/* Returns the number of bits in a C int. */
CAMLprim value c_int_size(value unit)
{
   return Val_int(sizeof (int) * 8);
}

We can thus freely use int in type signatures for external C functions taking C int arguments or returning C int.

We could have defined new int32 and int64 types for external function type signatures, but our bindings would then have become machine dependent. For example, a binding to a function int f(int) would have been external f : int32 → int32 or external f : int64 → int64 depending on the context. It would also have implied more type conversions between Photon and C code. We may change our mind in the future, however.

Function Type

We used the LLVM function type to denote the type of functions. However, functions are always referred to through a pointer. The llvalue returned by declare_function, define_function or lookup_function is a function pointer.

Moreover, functions are not first-class in LLVM which is, as its name implies, low-level. We cannot pass one function as argument or return one.

We accordingly changed our Fun type representation to be a function pointer.

Character Printing

The single quote was not handled by our print_escaped_char function. This resulted in '\'' being printed as ''' which is not a valid character.

We fixed this and also improved the printing routine to avoid escaping single or double quotes when it is not necessary (depending if they are printed as a single character or as part of a character string).

Other Changes

In addition to the aforementioned changes, we also did a bit of re-factoring. The general idea was to improve legibility.

Main changes:

  • We moved type handling from Photon to Typer. Typer now works on top-level phrases and is responsible for updating the type environment. It returns the inferred type for convenience to Photon which still dumps it (if requested).
  • We internalized builder into Compiler.t. It prevents parallel compilations in the same module, however:
    • Nothing guarantees that parallel operations on a single llmodule would be thread-safe anyway;
    • We can still compiles multiple modules in parallel (or at least, we should at some point in the future);
    • A Compiler.t is nearly always needed in addition to a llbuilder. Fusing the two lightens the code.
    • We avoid building a lot of short-lived builders. We reuse the same one, simply repositioning it.
  • We simplified type parsing.
  • We now flush output explicitly on the LLVM side. That way, our interpreter is not dependent of buffering settings. Its output when fed by redirection from a file is more similar to the ocaml one.
  • We removed the unused single-quote token.
  • We changed names of basic type printers to Printable_print_<type>. Keeping the print_<type> idiom would have clashed with standard library (e.g. print_string should print its argument without enclosing double-quotes and without escaping characters).
  • We renamed add and find respectively to add_var and find_var in Type_env to avoid confusion with add_type and find_type.
  • We added true and false constants, rather than hard-coding them.

Conclusion

Our interpreter is now much less cluttered, both in its output and its source code.

In the next installment, we will provide for function definitions.

Source Code

The code accompanying this article is available in archive photon-tut-09.tar.xz or through the git repository:

git clone http://git.legiasoft.com/photon.git
cd photon
git checkout 9-controlling_output.1

Discussion

Enter your comment. Wiki syntax is allowed:
 
blog/2011/01/31/controlling_output.txt · Last modified: 2011/02/20 20:28 (external edit)