Programming, Software and Code

Optimizing Parrot



Let's all face reality here: All the recent refactors of Parrot, including the ongoing PCC refactors, have either a neutral effect or a negative effect on Parrot performance. dukeleto showed me an eye-opening comparison between several versions of Parrot and the most recent rev of pcc_reapply that showed a stunning 400% execution time increase for some benchmarks between the 0.9.0 release and the current branch revision. That's terrible.

The Context refactors added about a 200% execution time increase. The PCC refactors are currently adding about another 200%. The fact that these two systems overlap so closely is certainly only exacerbating the problems. So why are these two refactors responsible for so much slowdown? Let's look at each in detail.

The Context refactor converted the Parrot_Context structure, which was manually memory-managed using bit twiddling reference counts, into a garbage Collectible PMC type. Had the changes stopped there, I have the strong belief that this would have been a neutral change in terms of performance. If anything I would suspect that this change would have been a small win overall. No matter how lousy and inefficient our Garbage Collector is, I have to believe that the tight loops and spacial locality of keeping PMCs together would be a good thing. Also things like the memory allocator in the collector are pretty efficient (although could definitely be improved). However, the refactors did not stop there. bacek also went through and properly encapsulated the Context PMC type so that there were fewer direct accesses of it's internal members. All accesses happen now through API calls, which are significantly more expensive. Plus, every field access adds an additional pointer dereference now too. Together I don't really know how these things could have added a 200% performance penalty, but it's pretty well documented that they did.

The PCC refactor is adding in a new abstraction step in the calling process. This makes the code prettier, along with much more maintainable. The cost is in making some of the operations a lot less efficient. Some of this is the fault of the new CallSignature PMC, which naively integrates several array and hash PMCs, inheriting from the oddly-named Capture PMC in the process. All strings, integers, and numbers are autoboxed into PMCs during the creation of the signature, and all return values get a CPointer PMC to hold the pointer. This is not to mention that several of these values are only accessible through named lookups, and you start to see a major performance problem. We're creating a shittonne of new PMCs for every call, and then using relatively expensive VTABLE accesses to recursively access the data in them.

chromatic has been doing some great work in the past few days rewriting the naive CallSignature implementation into a much more efficient form. Many of the operations can be more specialized for the task at hand, and the number of created PMCs can be reduced dramatically. bacek has also been doing some good work in this area, mostly small bits and pieces around as he finds them. So the situation isn't hopeless, but we are going to have to tighten several things up if we want Parrot to be as screaming fast as it was in 0.9.0 (if not faster).

Now that we have a very well-designed system, there are plenty of places where we will probably want to break encapsulation for a performance win. A few places where we will want to write some ugly code instead of the current prettier code, also for a win. We'll just make sure to mark these places with lots of documentation.

I've been focusing my attentions on fixing bugs. This weekend chromatic and bacek have been looking into them also, and together we're down to 1 remaining failure at the time that I write this. The last few tests were all very tricky, but this last one is turning out to be the worst of the bunch. I think we'll be able to beat it this weekend, however. After that, we're focusing on optimization until we're ready to merge the branch, probably soon after the 1.7 release.

So that's where things stand in terms of performance. Parrot has a lot of ground to recover since 0.9.0, but I am convinced that there are plenty of opportunities to do just that. Some of it will be algorithmic improvements, some will be small intelligent tweaks to individual functions and datastructures, and some will be real low-level bit-twiddly nonsense. However it happens, we really need to make Parrot faster, and I have very high confidence that we will be able to do just that.

This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.