The biggest thing I’ve been working on for the past few release cycles in
Parrot world are the IMCC cleanups and refactors. I’ve done a hell of a lot of
work in several separate branches, the most recent of which is called
whiteknight/imcc_compreg_pmc
. First I’m going to talk a little bit about
what I’ve been doing in this and previous branches, and then I’m doing to
talk about what the current status is. Finally, I’m going to talk about what
the plans are from here, and how this work fits in with some of our longer
term goals.
Work So Far
The IMCC work proceeded in 6 general stages, thought they weren’t always cleanly sequential. These 6 stages were:
- General IMCC cleanup. I moved all the previous interface functions into a
single file (
compilers/imcc/main.c
) and started cleaning up the relevant code to be more readable. As I cleaned up, I started finding places where duplicated code could obviously be consolidated. I wrote about some of these areas in a previous post. - I removed the
imcc_info
field from the Parrot Interp structure. Instead of passing an interpreter reference as the first argument to every IMCC function, I changed it so that we were passing animc_info_t
structure instead. Theimcc_info_t
structure was modified to now contain a pointer to a parent interpreter (instead of the other way around). - I rewrote the public-facing interface functions, consolidating around a a single primary codepath. All compilations for files and strings now happens through a common sequence of functions, where only the top-most wrapper routines are different.
- I created a new IMCCompiler PMC type to use as the new PIR compreg. At the
moment it looks and acts a lot like the old NCI PMC we were using, but it
contains state such as the current
imc_info_t
structure. - I created a new embedding API for IMCC, similar in design to Parrot’s
embedding API. These new functions are located in the new file
(
compilers/imcc/api.c
), and are intended to be used in similar places and in similar ways to Parrot’s embedding API. - I changed the way IMCC handles errors internally. Now it uses normal Parrot exceptions instead of using a custom exception system. I’ve only scratched the surface on this project, there is a lot more work to be done once we’re ready.
A final result of all this work is that, besides the IMCCompiler PMC, there are no direct ties from libparrot to IMCC. If IMCCompiler becomes a dynpmc (and it will, eventually), libparrot and IMCC will be completely separate entities.
Part of the problem we were having with IMCC is that it made a number of assumptions about it being the only compiler in Parrot. It would use this special arrangement to write global data directly to the Parrot Interp struct, and Parrot, in turn, would expect to find these results there. This created an inequitable situation where we couldn’t easily have new compiler frontends used in place of IMCC because IMCC was required at all stages. By separating IMCC out and starting to erect proper encapsulation barriers between it and libparrot, we both identify the necessary interface for compiler frontends to be able to use (including new non-IMCC frontends) and we also can start creating utilities which do not require IMCC.
Where Are We Now?
Currently, the branch builds and runs on most platforms. On my machine it passes all tests, though some of the more intensive NQP-based tests eat up a lot of RAM. A lot as in “3 Gigabytes” of it. For parrot hackers on memory-constrained computers, these tests fail outright with memory panic.
Why?
This took me a little while to track down, and it’s a non-trivial though straight-forward fix. First, take a quick look at TT #1990 for some background information.
IMCC outputs a PackFile*
structure pointer. In my embedding API work, I
added wrapper routines to wrap those raw pointers into an UnManagedStruct
PMC. UnManagedStruct is basically a dumb wrapper object around a pointer, and
provides no real logic for introspecting or maintaining that pointer. One
thing in particular that UnManagedStruct does not provide is the ability to
mark its contents for GC.
Occasionally when you run Parrot, prior to the fix for TT #1990, GC would run
after the PackFile*
was returned from IMCC, but before it was loaded into
the interpreter for execution. In that short window, the packfile was
essentially invisible to the GC and it gets collected. When Parrot attempts to
load and execute the packfile, BLAMO!.
So, as part of the TT #1990 fix, we add in several instructions to block the GC from executing at certain parts of the program. The GC doesn’t run when the packfile is unprotected, and our data is safe from premature collection. However, this creates a new problem.
IMCC is not GC safe. To my knowledge, it never has been. This means that to use IMCC you must first disable GC, run the compilation, and then reenable GC.
- Block GC
- Run IMCC
- Unblock GC
Combine that with the TT #1990 fix, and you get this sequence:
- Block GC
- Block GC
- Run IMCC
- Unblock GC
- Unblock GC
Block and unblock operations increment and decrement a counter respectively, so they nest. If you block it two times, you need to unblock it two times before it will run again. This seems pretty straight forward, but consider now what happens if we throw an exception inside IMCC as a result of a syntax error or something: IMCC doesn’t know how many times the GC has been blocked on the way in, so it only unblocks GC once on the way back out. The GC block count is never decremented back to 0, the GC never runs again, and Parrot slowly consumes all available memory before running into a memory panic.
Awesome.
Now I’ve explained the problem, what is the solution? Luckily, while I’ve been
tooling around in this IMCC branch Peter Lobsinger has been kicking butt and
taking names in some other areas of Parrot. Recently he added several new
pointer-based PMC types to replace UnManagedStruct and its ilk. One such PMC
type is PtrObj
, which includes the ability to perform GC mark on its
contents.
As I mentioned above, the solution is straight-forward: Modify the IMCC code
so that we create a PtrObj PMC instead of an UnManagedStruct PMC to hold
the PackFile*
pointer returned from IMCC. With that in place, we can remove
all the extra GC block instructions added in TT #1990 and hopefully start
talking about a merger shortly thereafter.
Where To Go From Here
The IMCC work is a little bit tangential to some of our stated roadmap goals. However, I do believe that doing this work is going to provide a necessary boost, and help make some future work much easier. One of the areas where we need to focus much of our attention is on the packfile system. That system needs some radical improvements, the first step of which is properly encapsulating it behind a useful API. Since IMCC is one of the primary encapsulation violators of that system, cleaning IMCC up will help to push along that work.