ParrotTheory: JIT
05 Sep 2009Lots of talk about JIT recently, so it might be a good idea to talk about what JIT is, what it does, and why it's important. JIT, which stands for "just in time", is a compilation strategy typically used by interpreted languages and virtual machines. It can be used to provide major performance increases for some execution environments, and can provide other benefits as well.
Compare and Contrast
Let's take a look at how statically compiled programs work. We write source code in our favorite language. We call the compiler which converts our code, through a series of transformations and optimizations, into a target language. For the purposes of this example, the "target language" is native machine code. Finally, we execute the native machine code. Everybody knows this pattern, it is nearly as old as computing itself. Benefits: Great performance of code at runtime. Drawbacks: long compilation time, compilation produces platform-specific machine code. Code must be recompiled after changes.
Next consider the very far end of the spectrum: interpreted code. We write code in our favorite language and pass it to an interpreter program. The interpreter program reads the instructions in the code and performs the necessary actions, without converting anything to machine code. However, it may convert the code first to a platform-agnostic bytecode format and then execute that. We save time up front by not compiling all the way down to machine code, but we lose performance at runtime. Our program isn't moving data around, it's telling the interpreter to move the data around for us. That extra "telling the interpreter" step can be relatively expensive. Benefits: No compilation step. Fast turn around between code edits and code execution. Cross-platform, works anywhere we have an interpreter. Drawbacks: Slow runtime performance.
Now look at a middle ground: JIT. We write code in our favorite language and pass it to the interpreter. The interpreter reads the instructions and converts the code to some kind of intermediate representation. As it executes the code, it converts some (or all) of it into platform-specific machine code. Code that has already been converted can be cached for later use for additional speed improvements. Benefits: Fast turn around between code edits and code execution. Cross-platform, works anywhere we have a JIT-enabled interpreter. Fast execution speed of JIT'd code. JIT'ing bytecode to machine code is faster then compiling source code to machine code (we've already done significant work in the compilation to bytecode, which can be cached to disk and loaded later). Drawbacks: additional startup overhead to JIT code at runtime. It might not be beneficial to JIT all code, so some bytecode will be interpreted and we will need to switch contexts to execute it.
These are all some lousy generalizations, but they give an idea of the differences between the different approaches. JIT gives us the portability benefits of the interpreted languages, but gives us a performance boost that is closer to that of statically compiled programs. Keep in mind that the baggage of some languages (runtime libraries, semantics) may make some languages faster or slower to execute then others by nature. Dynamic languages, for example, will tend to have some performance penalties because of things like dynamic dispatch. Combine that with the fact that many dynamic languages are interpreted instead of statically compiled, and you can start to see a relatively large performance gap that becomes misattributed to a fundamental fault with dynamic languages.
JIT Performance Metrics
When talking about JIT there are a few performance metrics that need to be taken under consideration: Speed of code generation and speed of the executed code. It stands to reason that the more time we spend optimizing, the faster the generated code will be. Alternatively, the faster we produce code, the less optimized it will be. It does tend to be a bit of a tradeoff, and the onus is usually placed on the user to determine where in the spectrum they want program performance to be. Short-lived programs will waste more time optimizing then they would executing to completion. Whereas programs that will run for a long time will benefit by taking a little more time up front to optimize.
Trace-Based JIT
While the user is typically tasked with selecting how agressively they want to optimize, there are ways for the computer to make that determination automatically. Once great example of this is in Trace-Based JIT. In a Trace-Based JIT (TBJIT) system, the interpreter begins interpreting code immediately without invoking the JIT system. As it executes, it records individual control flow paths called "traces". When it determines that a particular trace is "hot", or commonly executed, it passes that trace to the JIT for optimization and conversion down into machine code. Any time thereafter that we need to execute the hot trace, we call the cached, optimized machine code version instead.
Trace-based JIT is a nice middle ground because the computer determines which code would benefit the most from JIT, and only spends the overhead compiling code when it knows it will get a good return on investment for it. Only compile the code when the time it takes to compile will be outweighed by the long term performance improvement.
It is my sincere hope that Parrot can integrate a trace-based JIT engine eventually, because it would be a big boon for our overall performance.
Ahead Of Time (AOT) Compilers
Another variation on the theme are called Ahead-Of-Time compilers (AOT). AOT is like JIT, where we compile an interpreters LIR into machine code. The difference being that we do this before starting execution. JIT brings an overhead performance penalty because we are doing the machine code translation at runtime. In AOT, we do the compilation in a separate step entirely, so we can load in pure machine code at runtime with no compilation overhead.
Ideally in Parrot we are going to want to have some sort of AOT option, so users can compile code down to binary executables ahead of time to be executed.
Conclusion: Parrot's Situation
JIT is a very interesting subject, and can provide some significant benefits for Parrot. We gain two potential speedups from JIT (and additional speedups from AOT as well, if we have it): First, we reduce op dispatch cost by converting our loop and dispatch mechanisms with a string of raw machine code. Second, we gain the ability to perform optimization stages on the code as we generate it. However, these two things might be limited in effectiveness by Parrot's current architecture.
In Parrot, some of our ops are relatively large and implement some complex algorithms. The more complicated ops are, more often control flow is located inside them and the less often control flow is involved in transitioning between them. If a relatively small amount of our time is spent on op dispatch, it's not going to provide us with much benefit to optimize dispatch. In other words, if we spend 5% of our time doing dispatch, the most we can possibly improve performance with a new dispatch mechanism would be 5%.
Of course by that same token, if the ops are complicated and opaque to the JIT engine, it won't be able to get information about the internal structure and perform any meaningful optimizations on it. This is why a solution based on a simple, atomic op set like Lorito is so attractive: Because we gain the ability to really start optimizing all the way down to the metal.
This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.
Comments
Aristotle Pagaltzis
9/6/2009 7:44:04 AM
Whiteknight
9/6/2009 8:02:17 AM
Anonymous
9/6/2009 6:56:29 PM
Aristotle Pagaltzis
9/10/2009 12:22:14 AM
A key point you didn’t mention explicitly is that there’s more information that can be used for optimisation at runtime. The static analysis a classic “compiler” can do is limited in this regard. Therefore, a complex JIT can actually outperform a classic compiler by a significant margin.
This is especially true for trace-based JIT: traces can be compiled down to straight-line code that does not contain any control flow even if the original recorded execution path crosses scopes and functions. In other words, trace-based JITs can beat hand-written assembler code! This level of performance is unattainable through any other approach.
Good catch Aristotle! Yes, I did leave out that important piece of information. I'll try to post an update to include it.
Aristotle, is that true for Parrot? Can you give an example for such information in the case of, let's say, Rakudo generated code?
The things I said apply to JIT in general. Whether or not they are (or will be) true for Parrot depends on what level of JIT support Parrot features.