I’ve complained a lot about Parrot’s packfiles and packfile-related subsystems. A few months ago I started writing about a plan to fix some of these issues. Now, after something of an extended hiatus from hardcore parrot hacking, I’ve decided to get started on doing all this work.
The biggest part of my plan, from a user-interface perspective was changing this:
.sub 'foo' :load
...
.end
to this:
.sub 'foo' :tag('load')
...
.end
Also, as part of this change, we want to change this:
load_bytecode "bar.pbc"
to this:
$P0 = load_bytecode "bar.pbc"
$P1 = $P0.'subs_by_tag'("load")
$I0 = $P1.'is_initialized'('load')
if $I0 goto done_initialization
$P2 = iter $P1
loop_top:
unless $P2 goto loop_bottom
$P3 = shift $P2
$P3()
goto loop_top
loop_bottom:
$P1.mark_initialized("load")
done_initialization:
I’ve discussed these changes before but I will repeat one important detail that
is well worth repeating: Despite the fact that the total amount of PIR code
increases, the overall performance profile improves. The second set of code
snippets are faster to execute and are much more flexible. The performance
improves as the number of :load
or :init
subs in your packfile increases.
You gain the ability to tag subroutines with any string name you want, use as
many different string names for subroutine groups as you want, get a list of
subs to execute in any order at any time, be able to initialize and reinitialize
libraries at any time as many times as you want, and do a few other cool and
empowering things. Actually, the system is set up to allow any arbitrary PMC
in the constants table to be tagged with a string name and accessed using that
name, but the only thing we have syntax for right now are the Subs.
Another point that I also feel like repeating is this: People shouldn’t write PIR code by hand. Don’t do it. You will write something friendly like Rakudo Perl 6, or NQP or Winxed, and those compilers will automatically generate the necessary PIR sequences for you. It will take some effort on my part to get the various compilers updated, but the end user shouldn’t see any differences. So long as you aren’t writing your code in PIR directly, you’ll be fine.
So what’s involved in making this change? The packfile system, especially the
packfile loader and the bits that are intertwined with IMCC are some of the
messier or at least most obscure parts of Parrot. This is one of the subsystems
of Parrot that is the least abstracted and requires the most low-level bit
twidling. The GC is a comparable beast in this regard. Things like memory
alignment, offsets, complicated nested struct
definitions and accesses, byte
ordering, and a variety of other similar concepts are very common in this
subsystem. Any changes must be made with extreme care.
At the time of this writing I’ve already made several major changes in my branch
and have several more to go. I’ve removed the :load
and :init
flag syntax
from the IMCC lexer and parser (they’ve been deprecated for a long time now and
a replacement syntax has been available for some months). I’ve removed the awful
do_sub_pragmas
and PackFile_fixup_subs
functions and rearranged significant
amounts of logic. I’ve also taken a shotgun approach to converting :load
and
:init
flags in the various PIR code libraries to :tag("load")
and
:tag("init")
respectively. Some of these conversions were automated and a
little bit over zealous, but that is to be expected at this stage of the game.
Code for handling :immediate
and :postcomp
flags have been moved into IMCC
where they belong. libparrot now has no direct knowledge of either of those two
things.
Coming up are several more big changes before this branch is even remotely close to being mergable:
- I need to clean up interp initialization logic that had been spread around in various places. The embedding API defines a central execution path for running bytecode programs, and the initialization logic needs to be moved higher up along that path.
- I need to set a reference to the owning PackfileView PMC in each loaded Sub PMC. Currently we anchor every single PackfileView PMC to prevent them from ever being collected by GC. By storing a reference to the Packfile in the Sub PMCs, we can mark reachable Subs, mark reachable Packfiles, and therefore allow old, unreferenced packfiles to be GC collected. This fixes an old and sizable memory leak (especially in programs that do a lot of dynamic compilation. If we do this in the right way in the right places, we can close the memory leak and reduce iteration over packfile constants looking for Subs and PMCs to mark and keep track of.
- I need to rewrite
Parrot_load_language
and friends. This is the function that implements theload_language
opcode behavior. A “language” in this context is a library package which usually contains a compiler object, a runtime, and any dynpmc and dynop libraries required by those. This codepath shared much logic withParrot_load_bytecode
(the guts behind the oldload_bytecode_s
opcode variant), and therefore shares many problems and misbehaviors. - I want to refactor the relationship between the
PackFile*
structure and thePackfileView
PMC. If the PMC is the GCable wrapper around the structure, the whole thing is more easily managed by GC if we can guarantee that the PMC always exists if the struct does. Refactors here can further clean up code in the packfile loader and elsewhere. - We need to rip out a lot of old, dead, crufty code that is no longer needed.
This includes things like
Parrot_compile_file
andParrot_compile_string
, which are unnecessary internally because we have easily accessible and easily usable compiler objects at the PIR level. This also includes various helper routines forParrot_load_bytecode
,Parrot_load_language
anddo_sub_pragmas
, helper routines for working with:load
and:init
flags - I need to update Winxed, NQP-rx, NQP, Rakudo and any other HLLs and libraries that still rely on the old behaviors.
So there are plenty of steps left before we can talk merger, and I don’t expect to get this work anywhere near master before 4.2. It’s a lot of work but it’s fun and will be rewarding when it’s all done. I’m looking forward to getting this all wrapped up in the coming weeks and months.