Several days ago I wrote a post about writing a new JavaScript in JavaScript compiler for Parrot. It seems reasonable to me that JavaScript hackers would rather write their compiler for JavaScript in JavaScript itself instead of having to write it in C or PIR or NQP or whatever. With that idea in the back of my head and with some of the recent IMCC and Packfile changes that I’ve been working on recently, I started formulating an interesting idea about what to do with Parrot’s frontend in the coming months.
For the basic PIR compiler frontend (better known as the current Parrot executable) with the task of compiling a program written in PIR down to PBC and executing it, we have a few basic tasks that we need to do:
- Create the interpreter. This includes some basic command-line parsing for a
handful of parameters that must be set during interpreter initialization.
Specifically
--hash-seed
,--gc
, and--gc-threshold
. - Parse the rest of the flags, separating out the arguments into three sets: The settings for the interpreter itself (warning and debug flags to activate, a few other settings) the settings for IMCC (the format of the input and output files, if any), and the arguments to pass to the PIR program.
- Invoke IMCC to parse the input file into a proper Packfile, passing it any relevant commandline arguments.
- Wrap the arguments for the user program up into an array PMC
- Execute the program with the wrapped argument PMC.
- Destroy the interpreter and exit the program.
This is a pretty simple list of things. My idea starts off with an equally simple premise: What if we did less startup work in C, and jumped into a quick PIR “prefix” to do the rest? We would need to make a few modifications to IMCC, maybe add a function or two to the embedding API, and change a few things in the makefile, but it shouldn’t be too hard. The sequence would look like this:
In C:
- Create the interpreter, parsing the necessary subset of commandline arguments.
- Wrap up the commandline arguments into an array PMC (or even a Hash PMC)
- Get a reference to the prefix bytecode (compiled in to the binary, like a pbc_to_exe program), load it into the interpreter and jump directly to it, passing all remaining arguments.
In PBC:
- Create a new PIR compiler PMC. Register it with
compreg
. - Parse out the remaining commandline arguments, separating out the ones that go to the user program, and using the rest to set parameters on the interp and the PIR compiler PMC.
- Use the new PIR compiler PMC to compile the executable into a PackFile PMC.
- Get a reference to the
:main
function from the packfile and execute it, passing in user arguments. - Do any necessary cleanup and exit the prefix program.
In C Again:
- Destory the interpreter and exit the program.
This doesn’t necessarily look any simpler, but the reality is that we end up
with much cleaner code. All the code in PIR for instance can be wrapped in
a single exception handler, instead of having to check the output of every
single C API call for success. We also gain the ability to perform many tasks
in PBC through the runloop which really want to be done that way: argument
processing (PCC is much more powerful than longopt
) and the ability to write fundamental PMC types needed for bootstrapping in PIR (the new IMCC compiler
PMC being the perfect example). PIR is also the most natural place to be
creating PMCs like the PIR compiler PMC, registering it, and calling methods
on it. Plus, we really get a jump on the idea of rewriting portions of Parrot
in Lorito.
Believe it or not, there could actually be a performance boost from doing this, although it would probably be negligible in size.
The new PIR compiler PMC that I’ve been talking about will probably be written in C initially, but with a PIR frontend we can write the compiler wrapper itself in PIR, and register it there. Also, we can start to do really cool things, like:
.sub main :main :multi(...)
.param pmc help :named("--help")
# Print usage info here
.end
.sub main :main :multi(...)
.param pmc args
# Run the program here
.end
So a packfile can define a multisub as it’s main entry point, and the frontend can process the necessary arguments and dispatch to it. This is just a short example and obviously it raises more questions than it answers, but it is interesting to think about. The Rakudo guys have been doing something like this in their compiler, but Perl6 has defined dispatch semantics for the program entrypoint routine, and Parrot needs to be flexible enough to support multiple schemes.
As a short aside, I do believe it would be far better if we processed Parrot’s
command-line arguments into a Hash PMC instead of an Array PMC because we
suddenly gain O(1)
access to our arguments instead of needing to O(N)
search for each one we care about.
Anway, I’ve gotten off topic. I may expand on this more later.
Here is a short example of what a basic entry-point routine in Parrot would look like, after some of the proposed changes to the PIR compiler and Exception PMC backtrace improvements:
.sub __parrot_entry_point :anon :main
.param pmc args
.local pmc interp
interp = get_interp
push_eh __global_ex_handler
.local pmc pir_compiler
pir_compiler = new ['PIRCompiler']
compreg "PIR", pir_compiler
.local pmc compiler_args
.local pmc program_args
.local string program_file
.local pmc interp_args
(program_file, compiler_args, program_args, interp_args) = '__parse_args'(args)
interp.'set_options'(interp_args)
pir_compiler.'set_options'(compiler_args)
.local pmc packfile
.local pmc program_main
packfile = pir_compiler.'compile_file'(program_file)
packfile.'run_init_functions'()
program_main = packfile.'get_main'()
program_main(program_args)
exit 0
__global_ex_handler:
.local pmc exception
.local int exit_code
.get_results(exception)
finalize exception
pop_eh
__print_exception_backtrace(exception)
exit_code = __get_exception_exit_code(exception)
exit exit_code
.end
This code is pretty straight-foward, though it would get a little bit more
complicated if we wanted to support additional options such as the -o
commandline argument, which compiles the program to a .pbc file and writes it
out but does not execute it, or the -r
option which compiles to an output
.pbc file like -o
, but then immediately reads in from that .pbc file and
executes it. However, even with these options added I suspect a prefix program
like this would be less than 500 lines of well-written PIR. Depending on how
much or how little we want to do here, it could be much less.
By writing a frontend routine in PIR itself (or, any Parrot language for that matter), I think we gain a couple of things. First, we get Parrot processing power immediately, instead of having to do a bunch of operations in C, carefully converting types back and forth between C types and Parrot types, and having check every single operation for a thrown exception.
Second, by registering the PIR compiler PMC from running PBC code and not directly from inside libparrot during interpreter initialization, we decouple the two systems and gain the ability to remove IMCC from libparrot and move it into its own library. In such a system, it’s easily conceivable that we never register a compiler for PIR if one isn’t needed by the user.
Third, we gain a lot more flexibility with command-line arguments, and can
start talking about using multidispatch semantics for :main
, among other
things
Fourth, we can start defining things, like HLL type mappings and other
semantic modifications before we ever enter the compiler. This will allow the
compiler to automatically setup constant values using user-defined subtypes,
or to register in necessary utility pieces like an NCI call frame generator,
meta object protocol types, multidispatch handlers, concurrency scheduler, and
other pieces of a fully-pluggable Parrot before we ever enter into the
compiler. Instead of having to set up :anon :init :load
routines to register
these types during compilation, we have them ready for us beforehand.
Really it’s a lot to take in and a lot to think about, but I predict we could have a PBC front-end routine for Parrot like what I’ve described above by the 3.3 release in April if we really want it. I think I do want it. What do other people think?