Shenandoah GC: An overview
June 18, 2013 2 Comments
Let me give you a quick overview how the Shenandoah GC works.
First of all, let’s look at the heap in Shenandoah. Shenandoah’s heap is basically (like all heaps) a big chunk of memory, and in Shenandoah it is divided into many regions. Consider a heap of 1 GB, and a region size of 1MB, then you would get 1000 regions. This is different from the classic mark-and-sweep collectors, that have no regions at all, or the classic copying collectors that usually divide the heap in 2, or only a few regions. The reason for doing this is that we want to selectively evacuate those regions with the most garbage with higher priority (you know this idea from Hotspot’s G1 collector). Additionally, it makes multithreading a little bit easier (you can have multiple threads working on separate regions without stepping on each other’s toes… well, in theory ).
Now how does it work now you ask? Shenandoah is a mark-copy collector. That already implies the two important phases of Shenandoah’s collection cycle. In the first phase, we mark all the live objects in the heap. Considering the ‘copy’ in its classification, this may seem unintuitive, because normally copying collectors don’t need a marking phase, that’s their point (they collect garbage by copying live objects from one region to another, and leave the garbage behind). The main reason for having a marking phase is that we need to calculate the amount of live objects in each region. This is most naturlally done by running a marking pass over the heap. Starting from the GC roots (stack maps, internal data structures, etc) we follow the object graph and mark each object that we encounter, and update the containing region’s live counter accordingly.
The second important phase is the evacuation phase. Here we first select the regions that we want to evacuate (from-regions) and corresponding regions that the objects are copied to (to-regions). This is what we need the liveness information for. We then simply scan the from-regions and copy all objects that are marked to appropriate (e.g. with enough empty space) to-regions. (Note that this is different than classic copying collectors, that traverse the heap like we do in the marking phase. Linear scanning of the heap is much more efficient because of more predictable memory access, and as an additional bonus it preserves the allocation order, which is often a good thing.) After evacuation, the from-regions only contain garbage objects and we can free them up for future allocations.
So far it all sounds very simple and straightforward, and in fact is not even new. What we do, however, (or plan to do) is to make all of the marking and evacuation concurrent with mutator threads (i.e. mark and evacuate regions while application threads are running and messing with the heap), and parallel (i.e. do marking and evacuation using potentially many GC threads). And this is where the complication begins. I will go into the details of the issues and how Shenandoah addresses them in the next posts of this little series. Stay tuned!