Shenandoah GC: Concurrent parallel marking

Yesterday, I gave a quick overview over the Shenandoah GC. (And Christine also started blogging about Shenandoah today, giving even more background information.) I mentioned that Shenandoah collects garbage in two main phases: marking and evacuation. All of that needs to happen concurrently with the mutator threads.

Let’s look at the marking phase first. We kick it off by briefly stopping the world in order to scan the GC roots (java stack references and such). Then we resume the world and start traversing the heap going from the objects we just marked. Concurrently with the mutator.

The problem when marking objects on the heap while running concurrently with the application/mutator threads is, of course, that the structure of the heap changes while we’re traversing it. There are a few solutions to this problem, and we did not invent anything new here. We decided that we’d go with the so-called snapshot-at-the-beginning (SATB). The idea is that we want to mark all the objects live that are live at the beginning of the marking phase (hence the name). We probably mark a few more objects as live (those that become live during marking) but we don’t care about those (they don’t hurt). The reason we want to do this is the following: suppose we have an object A that points to B. During marking, A becomes unreachable. But there comes a new object C into live that receives a reference to B. Now, it is possible that the concurrent marking misses B because it can’t reach it anymore through A, and it doesn’t reach it yet through C. We’d basically treat B as garbage even though it isn’t.

How does SATB work? During marking, we employ a write barrier, that is an additional hook that gets called whenever mutator threads write reference fields in objects (not to be confused with memory fences, which are also sometimes called barriers…). This write barriers logs the original contents of modified reference fields into a linked list. After the concurrent marking phase is done, we stop the world, and process this SATB-list by marking all objects in it, and traversing their references where necessary. When this is done, we resume the world, and we can be sure that all objects that have been live at the beginning of the marking phase are marked as live. And since we’re also counting liveness during marking, we then know how much live data each region contains.

I also said that we’d do the marking work using parallel GC threads. How does that work? Instead of having one thread traversing the graph of live objects in the heap, we use many threads. Each thread maintains a work-stealing-queue. Each item in the queue is a heap object. Initially, the queues get filled with objects referenced by the GC roots. Each GC thread then pops an object from the queue, marks it, increase the containing region’s liveness counter, then checks all reference fields of that object, and pushes each object that is referenced back to the thread’s queue. This happens in a loop. Eventually a thread will run out of objects. When that happens, the GC work thread attempts to steal from another thread’s queue. When all threads are done, i.e. no thread can steal from any other thread anymore, we’re done.

After the marking phase is done, we know for each region how much live data (and therefore garbage) it contains, and all reachable objects are marked as such. All of the above is already implemented in Shenandoah and works well. Next step is the actual evacuation of regions that have accumulated enough garbage. This is a topic for a followup post. Or more likely, several follow-up posts.



About Roman Kennke
JVM Hacker, Principal Software Engineer at Red Hat's OpenJDK team, Shenandoah GC project lead, Java Champion

6 Responses to Shenandoah GC: Concurrent parallel marking

  1. OK, I’ll bite.

    You conveniently failed to mention that the concepts and ideas you’re covering in your blog entries already exist in the HotSpot (and many other) GCs:

    “it is divided into many regions” (G1, also Train, Azul, etc.)
    “a region size of 1MB” (default G1 region size)
    “selectively evacuate those regions with the most garbage” (G1)
    “we need to calculate the amount of live objects in each region” (G1)
    “parallel marking” (ParallelOldGC/CMS/G1, and many others)
    “parallel evacuation” (ParallelGC/ParNew/G1, and many others)
    “SATB” (G1, also Metronome, etc.)
    “Each thread maintains a work-stealing-queue.” (all parallel GCs in HotSpot, and many others)

    So, I’m really curious: Did you just clone G1, re-use most of the existing code, and call it something different? Or are you really starting from scratch?

    An additional observation:

    “After the concurrent marking phase is done, we stop the world,”
    “in order to reduce pause times,”

    Didn’t you say the GC is pauseless?


    Tony Printezis

  2. Pingback: Shenandoah GC: Brooks pointers | Roman Kennke's Blog

  3. sirinath says:

    Perhaps you guys can get some insight from C4 also.

  4. Pingback: jClarity | Shenandoah – A new low pause Garbage Collection algorithm for the Java Hotspot JVM

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: