GC Gets Kick Start
01 Dec 2009Parrot's garbage collector is starting to really get a lions share of developer attention recently, especially after some very interesting benchmark statistics from chromatic went public. For the benchmark of building the new NQP-RX project, a whopping 80% of execution time is spend in the GC mark phase. Actually, the real statistic is that 80% of the execution time was spend only in the Capture PMC's mark routine (and the functions that it calls). That's quite a huge amount of time, even for a naive GC like ours to spend.
Let's take a quick recap about GCs, and what causes them to be so expensive: GC is used to find and automatically reclaim unused data items so that the storage space can be reused. A good GC system means that the system programmer does not need to manually free memory when it is done being used: The GC will detect and automatically deallocate the unused memory. A good GC system, in essence, will completely eliminate memory leaks, and help to make the codebase much more clean and succinct. To do this, GC needs to first find all unused ("dead") objects and then free them, two phases known as mark and sweep.
In a naive implementation of a mark and sweep algorithm, there are two aptly named phases: mark and sweep. The mark phase is charged with finding dead objects to reclaim. We typically do this in a reverse order, by first finding all objects that are in current use ("alive"), and then declaring all other objects to be reclaimable (or already free). Starting from a root set, such as the register sets and interpreter globals in Parrot, we can construct a graph of all objects by following pointers and marking each reached object as alive. It stands to reason that if there is no pointer to a particular object, it cannot be accessed, and anything that we do not access during the mark is presumed unreachable, which is the same as dead.
In the sweep phase, we must iterate over the pool of all objects, finding objects marked dead and freeing them. Freeing an object typically involves calling a custom destructor if one is provided, and making the memory available to the allocator so the memory can be reused the next time an allocation is made.
In every mark and sweep GC collection run for a naive collector, we must first trace the entire memory graph and then iterate over the entire object pool. This is very expensive, and the expense grows large as the memory use of the program grows large. What we need for Parrot is something a little bit less naive.
What we probably can not do is make huge conceptual improvements to the idea of mark and sweep: We will always need to detect alive objects, and we will always need to traverse and free the dead ones. The general idea is sound and that's not something we want to change. What we can do, however, is to impose heuristics on the system to decrease the number of items to mark and decrease the number of objects to sweep. This is where the bulk of GC performance improvements can be made, by being much smarter about how the GC is used.
Allison sent a nice email to the list the other day essentially saying that GC has become an officially-recognized pain point and that we as a team are going to be looking at improvements after 2.0 (if we don't manage to start before that). Very interesting discussion has already started on ways to improve it.
As I mentioned above, the bulk of GC performance improvements are made by applying heuristics to decrease the number of objects to mark and sweep. A secondary set of improvements can then be made, often at the code level, to make the GC's operations run faster. I'll call the first set of improvements "algorithmic", and the second set "implementation". So what we the Parrot developers need to do first is pick the right algorithms to use and then implement and optimize them.
Here is a general list of things we can do to improve GC performance in Parrot:
- Allocate fewer GCable objects. This is typically the result of user-level code optimization. So, Parrot needs optimizers that are GC-sympathetic. Parrot also allocates a number of STRINGs and PMCs for internal purposes, so we need to minimize that.
- Mark fewer objects. This comes from a good generational GC system where we segregate items based on how stable they are.
- Sweep fewer objects. I think chromatic's linked-list idea will help us significantly in this regard.
What I think we are leaning towards in Parrot is a system called a generational GC. A generational system uses the heuristic that items which have lived for a long time without being GC collected will tend to stay alive longer, and items which are recently allocated tend to die quickly. It's an acknowledgement that a lot of garbage is created for very short-term uses, and relatively few things stand the test of time. Here's a quick example using explicitly non-idiomatic Perl 5:
my @array = fill_array(100); # 100 items in the array
foreach my $item (@array) {
my $new_item = mangle($item);
say $new_item;
}
In this loop we create a lot of garbage. Every new instance of $new_item is a new collectible item which can be declared dead at the bottom of the loop and allocated anew at the top. Also, all the local variables used inside the mangle function follow the same life cycle. The only items that survive through the entire snippet are @array and it's contents.
Every time we mark, we have to mark @array and all it's contents, even though they are long-lived and will survive the entire loop. Every time we sweep we need to separate dead items $new_item and all the local variables created inside mangle() from @array and its persistently live set.
Generational GC works by saying that @array is long-lived and putting it into an older generation. Older generations contain objects which are, by definition, older and therefore less likely to die. If we aren't worried about the item dieing, then we don't need to explicitly mark it. At least, we don't need to mark it as often. We also don't need to sweep it, if we can find a good fast way to avoid that.
This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.