The embedding API work wasn’t done in a vacuum. Getting a nice shiney new embedding API in place was a nice thing to do, but it was only one component of a larger goal: the removal of IMCC. Most of the embedding API work, such as the cleanup and encapsulation of various bits of logic, was required to start moving IMCC out of libparrot Core. Some of the details were just nice things to do, and I figure that if I am going to be taking on a project like this, I’m going to do my best to do it correctly. For some value of “correctly”.
Yesterday, I got started on the next phase of the IMCC work: Packfile PMCs. First let me describe some of the issues we’ve been having, and then I’ll talk about how I’m going about fixing it.
In Parrot master right now, IMCC compiles .pir
files to a PackFile*
structure. That structure is wrapped up in a generic PtrObj PMC and returned.
This is good because the PtrObj PMC is marked by the GC and so the underlying
PackFile*
structure (and all the PMCs it contains) are marked too. That’s
important. Also, we can pass this PtrObj PMC to various API routines which
know how to call VTABLE_get_pointer
and extract the underlying PackFile*
structure from it. The downsides to this approach are that the PtrObj PMC is
just a dumb wrapper that provides no user-visible interface for working with
the PackFile*
data. As far as the user is concerned the PtrObj is an opaque
data item that gets passed to things that know how to work with it. This is
suboptimal for a lot of reasons.
We have several other PMC types that work with PackFile structures too. The Eval PMC type works something like a wrapper around PackFile but it has some interesting behaviors: It inherits from Sub and pokes directly into the already tangled and un-encapsulated internals of that type. Eval works like an array of Subs, and integer-keyed indexing into Eval returns Sub PMCs from the PackFile constants table. However, Eval doesn’t provide any way to access other constants, such as string, floating-point, or non-Sub PMC constants. Eval also doesn’t really provide any tools for working with Packfiles in a more general sense: You can’t read into an Eval PMC from an existing .pbc file, or easily write the constants out to a file. You can’t read into it from a string of bytecode either.
We also have a series of Packfile PMCs that were created some time ago, mostly
to support the ability to create PBC files from PCT. These PackFile PMCs,
instead of providing an interface into existing PackFile*
structures like
those generated by IMCC, provide an alternate implementation of them. The
Packfile PMC doesn’t store a pointer to the PackFile*
structure. Instead,
it stores all the data a PackFile*
would store in PMC attributes, with each
sub-structure getting a separate PMC type to be loaded into. In essence, it’s
not an interface to PackFile*
, it’s a duplicate of it.
Right now, as I mentioned, IMCC was returning a PtrObj PMC. That’s lousy so I
wanted it to return something better. I don’t like Eval and I want it to
disappear, so I decided to try to use the Packfile PMC instead. However, the
PackFile PMC doesn’t have a set_pointer VTABLE for setting the PackFile*
structure coming out of IMCC. It also didn’t have a method for accessing the
:main
Sub from the packfile, or accessing Subs or other constants easily. I
tried adding some of this logic but quickly found that differences in the
design between the Packfile PMCs and the PackFile*
structure were hard to
get around. Also, there’s one other problem: inefficiency. Many of Parrot’s
internal packfile operations only work on a PackFile*
struct, not on any of
the available PMC types. So for those operations, the Packfile PMC needs to
create a temporary PackFile*
structure for those operations, and then
destroy it again. Also, once we create a PackFile*
structure in IMCC and
pass it to the Packfile PMC, we need to recurse through it and create PMCs to
mirror all the exact same data. That’s something of a huge waste to do every
time we want to compile something.
Let’s look at where we are heading, so we can understand why our current options don’t quite do what we needed and what I am hoping to do going forward.
I’ve mentioned before on this blog that I want to work towards rewriting the
Parrot executable frontend in PIR, or some other Parrot language. To do that,
we need fully-functional packfile PMCs. Look at some of the command-line
arguments and capabilities we are going to need to have to support: -o
to
output a file, the ability to load in a .pbc file, or load and compile a .pir
file, -c
to compile .pir to .pbc, -r
to compile, output to file, then
execute the .pbc file from disk, etc. That’s quite a lot of stuff but not
entirely outside the realm of the possibility of Eval and Packfile PMCs.
However, the Parrot frontend is not the only program that might benefit from
being rewritten in a Parrot language. pbc_merge
is another program I’ve
been looking at recently that would definitely benefit. That logic is quite
messy, and if we can hide enough of the messy logic behind API calls and then
expose those through PMC methods, we can rewrite much of that.
pbc_disassemble
and pbc_dump
programs too. Think about other programs that
we could create to open, analyze and modify packfiles, but we haven’t yet
because the interface is so obtuse. pbc_merge
alone needs a lot more stuff
than the parrot frontend needs: The ability to read in multiple packfiles and
add data from each to a single output packfile. This means we need to be able
to easily create a PMC from an existing .pbc file (which the Packfile PMC and
friends cannot really do well enough), we need to iterate over all the
contents of the resulting packfiles (which Eval cannot do), then combine
together all the data and write that output that to a file (which Eval cannot
do either).
Where do we go from here? Yesterday I started creating a new PMC type called
PackfileView
in the whiteknight/packfilewrapper
branch. The PackfileView
PMC acts like a thin wrapper around an existing PackFile*
structure, with
methods which are thin wrappers around packfile subsystem API functions. It
doesn’t inherit from Sub or anything else (so we avoid encapsulation-breaking
problems that Eval gets into), and uses the packfile subsystem API instead of
poking into those guts directly. PackfileView is basically read-only, although
some of the operations that happen on PackFile* necessarily modify its
internals. We can’t make it a perfectly read-only interface, especially not if
we want to do anything interesting or worthwhile with it. However, we can
avoid most of the operations which explicitly modify the contents in
irrepairable or dangerous ways.
Since PackfileView is a thin wrapper around PackFile*
, we can return it
directly from IMCC. Since it has a nice interface, we can interact with it
from PIR. So, PtrObj is out, PackfileView is in. All current code works
because PtrObj only had two inteface functions: get_pointer and set_pointer,
and PackfileView provides those the same way.
The IMCCompiler PMC replaced an old NCI PMC that served as the registered
compile for PIR. For backwards compatibility, IMCCompiler’s invoke VTABLE
returns an Eval PMC. However, it also adds methods .compile()
and
.compile_file()
which are now returning PackfileView PMCs in the branch.
This provides a clear and seamless upgrade path for users while we deprecate
Eval PMC (and eventually, IMCCompiler.invoke).
So that’s my plan for the Eval PMC and PackfileView. I hope it gets a green light and gets merged into master eventually. I need to do a heck of a lot of documenting and testing on the new PMC before we can start talking about making any kind of switch anyay. However, what’s the story with the existing Packfile PMCs?
The existing Packfile PMCs are fine. They were built for the particular
purpose of allowing PCT and other compilers to build .pbc files. This is an
important and highly desirable goal, and not something we want to get in the
way of. However, this goal is relatively specialized: There’s not much reason
why we would need to keep that functionality built into Parrot at all times.
Instead, I suggest that we move those PMCs out into a dynpmc library and make
it part of a large package of compiler-building tools. This is just a
suggestion, I don’t have a strong enough opion on the matter and the existing
Packfile PMCs aren’t really causing any harm where they are right now. Plus,
I’m anticipating the counter-argument that almost all of our users either are
HLL compiler projects, or are written in dynamic languages which support
runtime eval
and therefore require compiler objects to be around at all
times, so most usages are going to require those PMCs. I don’t have an answer
for that really and like I said this isn’t a strong opinion that I’m prepared
to fight about.
Right now I’m working on the PackfileView PMC, getting it working well and giving it enough functionality to completely replace Eval. Next step is to start seriously cleaning up the packfile subsystem API and unifying lots of bits of logic repeated throughout the codebase. After that, I’m going to do some cleanups on the IMCCompiler PMC, then start the final bit of work to get IMCC pulled out of libparrot and turned into a dynamically-loaded extension. That last part is probably going to depend on some of the GSoC projects going on this summer, so it certainly can’t happen before the end of the season.
Let me close out this post with a snippet of code which is working right now in my branch:
.sub main :main :anon
.param pmc args
.local string exe_name
.local string prog_name
exe_name = shift args
prog_name = shift_args
.local pmc pir_compiler
.local pmc packfileview
pir_compiler = compreg 'PIR'
packfileview = pir_compiler.compile_file(prog_name)
.local pmc main_sub
main_sub = packfileview.main_sub()
push_eh _handler
main_sub(args)
exit 0
_handler:
.local pmc ex
.get_results(ex)
say "Unhanded exception:"
$S0 = ex["message"]
say $S0
$I0 = ex["exit_code"]
exit $I0
.end
Add in a “load_language 'PIR'
” call, and all might be well with the
universe.