JIT Redesign
04 Sep 2009The time has come. Parrot's JIT system is now in my crosshairs and phasers are set to kill.
bacek has spent a lot of his time in the past two weeks working on the Context PMC stuff. The last thing he was blocking on was a failure in the JIT system that only appeared when "make testj" was run. I, of course, couldn't even observe this failure because Parrot doesn't have JIT support on x86_64. The root problem for him was that the JIT system has a lot of hardcoded knowledge about the shape of some of Parrot's core data structures, and any changes to those structures breaks what JIT thinks it knows. In response to this problem, and some frustrations I have been building up over the past few months, I sent a mail to the list suggesting we deprecate the current JIT system and schedule it for removal in 2.0. Response there was mostly positive, but the ball really got moving when I mentioned the issue in this week's #parrotsketch meeting.
Why do I want to deprecate JIT? Isn't JIT the new hotness for virtual machines and interpreters? Isn't it the savior of performance for dynamic languages? The short answer is that I want to deprecate our current JIT implementation and replace it with something even better.
What JIT does is this: It takes information about the instructions to execute from the parser. This is going to be the compiler's Low-level Intermediate Representation (LIR). Parrot's LIR is the compiled bytecode that we are supposed to execute. JIT takes this information and converts it to machine code on the fly, so that it can be executed directly by the processor without need to dispatch to op routines. In theory what this can do is significantly reduce the overhead of opcode dispatch, by making control flow more "natural" to the machine, which in turn helps with branch prediction and caching. If you combine this with the ability of some JIT engines to optimize the output machine code agressively, you can end up with some major performance savings with a good JIT. I'll discuss the workings of JIT in more detail later this weekend.
In short, Parrot wants JIT. Parrot needs JIT. We simply aren't going to be a viable alternative to other VMs without it in the long term. Especially not when you start looking at some of the amazing performance improvements VMs have been making recently directly because of their JITs.
Parrot's current JIT just doesn't fit the bill though. It suffers from a number of terrible problems, which I enumerated to the list also:
- It doesn't really provide much performance improvement for most programs. It also doesn't have much opportunity to perform any optimizations at the machine-code level.
- It is too closely tied to Parrot's core, breaking all sorts of encapsulation barriers. As I said earlier, the JIT system needs to mirror certain algorithms and data structures used in Parrot core, which means a change made in the one needs to be faithfully preserved in the other. When we can't do that because none of our current developers know the system well enough we end up with a huge decrease in development momentum. Of course an argument could be made here that more people need to learn the JIT system.
- Parrot's JIT system is very platform specific. There simply is no implementation on most systems (only x86 and PPC have it), and there is no easy way to share code between platforms to give new platforms a head start. If I want to write a JIT for a different system (such as amd64 for example), I basically need to start completely from scratch.
- It is an absolute mess, and a maintainability nightmare. Nobody really knows what all it does or how it all works. It's also not documented well enough to bring any new developers up to speed on it. Top this all off with the overly complicated way that it is written. We simply can't make any improvements on the code, not without a herculean effort that honestly isn't worth the time.
- It is nowhere near some of the other existing JIT engines in terms of usability, quality of generated code or capabilities. libJIT, nanoJIT and LLVM have entire dedicated teams working on them, we aren't ever going to match what they have without a similar expenditure of time and effort.
- We rewrite Parrot's PASM ops in terms of a new LIR. This could be Lorito (Previously L1) if we want it to be backend-neutral, or it could be something that's specific to the particular JIT engine we use (like LLVM ops, for example).
- We need to be able to convert the LIR to C for direct execution, and to a JIT definition for indirect execution. There are a number of ways we could do this, depending on the skills and availability of our development team.
- The configure system will determine which JIT backends are available (if any), and generate the necessary code to support them during the build.
- We only need a minimal API that the Core of Parrot will interact with: JIT the incoming PBC, call into JIT'd code, output an executable and maybe a select few other operations. Much of the code can be generated at build time.
Obviously this is all very preliminary, and we will refine the design as we move forward and gather more information about all our options. I'm sure I'll be posting updates here as new things start happening.
This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.