There has been a certain amount of confusion recently about the PIR subroutine
flags :init
, :main
, and :load
. Certain people, myself included, were
tryting to deprecate the :load
flag, while other people (specifically people
from the Rakudo project) were suggesting that perhaps we needed more. I’ve
been thinking about this a lot and have an idea for a nice flexible option
that should satisfy the needs of our users and help us clean and consolidate
the codebase at the same time.
The core idea of tags like :init
, :main
, and :load
is that we want to be
able to attach metadata to subroutines in PIR (and, by extension, other
languages), and be able to execute subs with the given tag in certain
situations. When we use the load_bytecode
op to load a bytecode file as a
library, for instance, we want to trigger all :load
functions, in any
order. When we load a bytecode file as an executable, we want to trigger all
the :init
functions in any order, followed by a single :main
function.
Fast forward a few months and assume that we now have a working PIR compiler PMC type, which we can attach methods and data attributes to. I’m going to execute this little snippet:
pircompiler = compreg "PIR"
$P0 = pircompiler.'compile'(source_filename)
What happens in this situation? Or, more importantly, what should happen?
We know that the $P0
PMC is going to be a PackFile PMC, but
we don’t know what the user intends to do with it. We don’t know if the
intention is to take that packfile and immediately (and without sideeffects)
write it out to a new .pbc file, or if the user intends to load it in to
Parrot’s interpreter and immediately start calling functions from it like a
library. Maybe the user wants to treat this as a program, and immeduately jump
into its :main
method. Or, maybe the user wants to introspect the packfile,
to calculate complexity metrics. Maybe the user wants to run an in-place
optimizer over the packfile. We simply don’t know what the user wants to do.
Assuming that the user wants to treat this like a library and trigger all the
:load
functions is going to be wrong in many cases. Assuming that the user
wants to treat this like an executable and trigger :init
and maybe :main
is frequently also going to be wrong. And when Parrot makes these kinds of
assumptions we’re going to have users who have difficulty with it and need to
implement ugly workarounds to get what they want without Parrot doing too much
without asking. It’s my opinion that Parrot should never do what you don’t
want it to do without asking first. It’s also my opinion that Parrot is tool,
a library for creating a dynamic language runtime, not the end program in
itself. The programmer uses it as a base to do what is needed; Parrot doesn’t
always know what that should be.
What we really need is a way for the user to specify exactly what they want
to happen and when. It’s not for Parrot to decide. :init
, :main
, and
:load
should be triggered on command, not automatically. Here’s an
example:
packfile = pir_compiler.'compile'(filename)
packfile.'trigger_load_functions'()
Or:
packfile = pir_compiler.'compile'(filename)
packfile.'trigger_init_functions'()
main_sub = packfile.'get_main'()
main_sub(main_args)
Suddenly users have control over what executes when. However, this really isn’t a great solution either. For starters, it’s ugly to have two methods to perform such similar actions. Further, it doesn’t make any sense to artificially limit the types of tags that we have. Some compilers need more. I know Rakudo does. So, maybe we want something like this, borrowing some terminology from Rakudo:
packfile = pir_compiler.'compile'(filename)
packfile.'trigger_phasors'("load")
But then, we want to have any arbitrary phasor attached to a sub. We should be
able to give it any name and attach as many as we want. We may have many
needs. Here we will create a little PIR frontend program
to load a bytecode file, trigger the :init
phasors, the :main
function
and a new :end
phasor for post-facto cleanup:
packfile = pir_compiler.'compile'(filename)
packfile.'trigger_phasors'("init")
main_sub = packfile.'get_main'()
main_sub(main_args)
packfile.'trigger_phasors'("end")
What do these new flags look like in PIR? Here’s one idea:
.sub foo :tag("init")
.end
Right now, Parrot has a really ugly system for dealing with :init
and
:load
flags. When we want to fire all of the :init
subs, for instance,
we loop over all PMCs in the packfile’s constant table looking for subs. When
we find a sub, we check its flags. If the flags match what we are looking for,
we execute the sub and then remove the flags. The current process for firing
:init
and :load
is destructive, which means that once you do it you can’t
do it again. This doesn’t come up much, but it’s still an ugly restriction.
Some people would probably argue that we don’t need to be adding a million new methods to the PackFile PMC. The [single responsibility][srp] of the PackFile PMC should be working with the PackFile structures and file format, not executing subroutines. Especially not assuming to know how those subroutines should be executed. In answer to that I suggest maybe we could do something like this:
$P0 = packfile.'get_tagged_subs'("init")
$P1 = iter $P0
trigger_init_top:
unless $P1 goto trigger_init_bottom
$P2 = shift $P1
$P2()
goto trigger_init_top
trigger_init_bottom:
This is a little bit more work, but not a whole lot. Also, it would be trivlal to encapsulate this logic into a subroutine and reuse it for all your phasors. The benefit of this situation is that the packfile is only responsible for providing read access to the list of subs, the user can decide whether to execute each and if so, how. For the common case, this code would be hidden away in the Parrot executable frontend.
We also start to get the idea that maybe these phasor subs could start to take arguments, or return results. I can think of at least a handful of reasons why I would like to have that feature available in a PIR-based Parrot frontend program like I was envisioning a few days ago. Other languages, such as Rakudo would have their own entrypoint routine and would be able to decide when, if and how to trigger each of it’s tagged sub types.
One criticism I can forsee is that different libraries and executables may use different sets of tags that are expected to be executed in different orders or with different parameters than the “normal” set. This is true, but I would counter to say that the way to load and initialize a library is part of that library’s API and should be well documented and understood by users before use.
The more I think about this idea the more I like it. This should help to clean up a lot of code in libparrot, especially some code which is very ugly and buggy. We free up a lot of ugly flag logic in IMCC and in the packfiles system and replace it with a simple keyword search (and maybe a hash for better performance). We should be able to resolve a handful of long-standing tickets and issues, and provide a lot of new flexibility to the users of libparrot.