This is the 15th article of a series (table of contents) about compiler development with LLVM using OCaml. We intend to develop a compiler for a subset of OCaml large enough to allow our compiler to compile itself.
In this article, we implement a batch mode, i.e. the possibility to call photon interpreter on files, rather than interactively.
To allow batch processing, we allow file names to be given on the command line. When present, we parse, type, compile and executes them rather than starting the interactive interpreter.
When batch-processing a file, we will be less verbose than when in interactive mode.
Finally, we will clean-up a bit after ourselves. As a great number of files can be given on the command line, we need to reclaim resources as we go, rather than waiting the end of compiler execution.
Files are not parsed a single top-level statement at a time. Indeed, the ;;
is optional between top-level statements when the separation between them is not ambiguous, and most OCaml files do not contain a single ;;
. The ;;
is mostly used in the interactive interpreter, where it signals to the interpreter that the user has finished its input and that the interpreter can parse, type and evaluate it. Indeed, even in the interpreter, OCaml allows multiple statements in one go:
# let a = 1 let b = 2;; val a : int = 1 val b : int = 2
Our current interpreter would report a syntax error when seeing the second let
where it expects a ;;
.
The OCaml interpreter goes further than a ;;
when parsing, but not for typing. The following file with a syntax error does not print the message when evaluated.
print_endline "Hello";; let let
% ocaml syntax_error.ml File "syntax_error.ml", line 2, characters 4-7: Error: Syntax error
However, following code with a type error is partly executed:
print_endline "Hello";; 1 + 2.0
% ocaml type_error.ml Hello File "type_error.ml", line 2, characters 4-7: Error: This expression has type float but an expression was expected of type int
I don't know exactly how OCaml interpreter buffers and process file input, but we will do something simpler. We parse, type and execute until a ;;
or end-of-file is found. We can use this process both for interactive input and for file input.
For files, we could have considered parsing and typing the whole file in a single step. While this would prevent partial execution, it would also prevent calling our interpreter on a named pipe (our interpreter would block until end-of-file is reached, defeating the purpose of a named pipe in most cases). Our approach is also a bit more memory efficient, as it has not to build the whole-file AST into memory. Of course, if the given file contains no ;;
, the behavior will be the same as whole-file processing.
Practically, we rename Parser.interactive
into top_levels
and we make it returns a list of top-level statements, instead of just one. The grammar is modified accordingly. Here is a relevant excerpt:
... %start top_levels %type<Position.t Ast.top_level list> top_levels %% decl_or_dirs : decl_or_dir { [$1] } | decl_or_dirs decl_or_dir { $2 :: $1 } top_levels : EOF { raise End_of_file } | decl_or_dirs SEMI_SEMI { List.rev $1 } | decl_or_dirs EOF { List.rev $1 } | expr SEMI_SEMI { [Eval $1] } | expr EOF { [Eval $1] } decl_or_dir : LET IDENT EQUAL expr { Bind_val($2, $4) } | LET IDENT args COLON type_ EQUAL expr { Bind_fun(false, $2, List.rev $3, $5, $7) } | LET REC IDENT args COLON type_ EQUAL expr { Bind_fun(true, $3, List.rev $4, $6, $8) } | EXTERN IDENT COLON type_ EQUAL STRING { External($2, $4, $6) } | directive { $1 } | error { raise (Error(pos 1, "syntax error")) } ...
We simply compile top-level statements in sequence, using the same machinery as before. The only difference is that we do not systematically print the evaluated results on standard output. While desirable in interactive mode, it is not for batch processing. I thus added a must_print
argument in relevant places to control whether or not result printing should be compiled in or not.
A second change is the addition of a dispose
function to reclaim resources associated with the compiler. It simply releases the underlying module and manually clears the tables.
... let dispose comp = dispose_module comp.module_; Hashtbl.clear comp.globals; Hashtbl.clear comp.args
Command-line parsing has been modified to accept arguments in addition to options. Each argument is verified to be an existing file, and then queued for processing.
When command-line parsing is done, we check whether or not there are files in queue. If there are, we process them one at a time. Each file is processed in a fresh environment (compiler, lexing buffer, type environment). Once done with a file, we call dispose
on its compiler to release underlying LLVM module.
When processing a file, we stop at the first encountered error, processing neither additional tokens from the faulty file, nor additional files still queued for processing.
When no file has been queued for processing, we start the interpreter as before.
As we use the same ;;
-delimited block approach both for interactive and file processing, most of the logic is shared between the two cases and has been accordingly shared in the code.
photon.ml
has been significantly refactored, but although modifications are numerous, they are simple so I refer you directly to the code for details.
The code accompanying this article is available in archive photon-tut-15.tar.xz or through the git repository:
git clone http://git.legiasoft.com/photon.git cd photon git checkout 15-batch_processing
Discussion