Introducing Shenandoah Traversal mode – part I

In this post I want to introduce a (not so very) new GC mode that we call “Traversal GC”. It all started over a year ago when I implemented the ‘partial’ mode. The major feature of the partial mode is that it can concurrently collect a part of the heap, without the need to traverse all live objects, hence the name partial GC. I will go into details of how that works in a later post. First I would like to explain another foundation of Shenandoah’s partial GC, which is the single-traversal-GC, or short Traversal GC.

Let me first show some pictures that explain how Shenandoah works (and in-fact, more or less how other collectors work):

Shenandoah usually runs in one of two modes (switched dynamically depending on automatic ergonomic decisions): First the 3-phase mode:

The cycles are:

  1. Concurrent mark: traversal all live objects, starting from roots, and mark each visited object as live.
  2. Concurrent evacuation: based on liveness information from concurrent marking, select a number of regions, compact all live objects in that region into fresh regions.
  3. Concurrent update-refs: scan all live objects and update their references to point to the new copies of the compacted objects.

Each concurrent phase is book-ended by a stop-the-world phase to safely scan GC roots (thread stacks) and do some housekeeping. That makes 4 (very short, but still measurable) pauses during which no Java thread can make progress.

When GC pressure is high, and GC cycles run close to back-to-back, Shenandoah switches to 2-phase operation. The idea is to skip concurrent update-refs phase, and instead piggy-back it on subsequent concurrent marking cycle:

In other words, we now have:

  1. Concurrent mark: traversal all live objects, starting from roots, and mark each visited object as live. At the same time, when encountering references to from-space, update the to point to the to-space copy.
  2. Concurrent evacuation: based on liveness information from concurrent marking, select a number of regions, compact all live objects in that region into fresh regions.

As before, we still have pauses before and after each phase, now totalling 3 stop-the-world pauses per cycle.

Can we do better? Turns out that we can:

Now we only have one concurrent phase during which we:

  1. Visit each live object, starting from roots
  2. When encountering an object that is in the collection-set, evacuate it to a fresh compaction region
  3. Update all references to point to the new copies.

The single concurrent phase is book-ended by 2 very short stop-the-world phases.

This probably sounds easy and obvious, but the devil lies, as usual, in some interesting details:

  • How to select the collection-set? We have no liveness information when we start traversing.
  • How to ensure consistencies:
    • Traversal consistency: how to deal with changing graph shape during traversal
    • Data consistency: how to ensure writes go to the correct copy of objects, how to ensure reads don’t read stale data, etc
    • Update consistency: how to avoid races between updating of references and ordinary field updates

I will go into those details in a later post.

If you’re interested in trying out the traversal mode, it’s all already in Shenandoah (jdk10, jdk11 and dev branches) and stable enough to use. Simply pass -XX:ShenandoahGCHeuristics=traversal in addition to the usual -XX:+UseShenandoahHeap on the command line. More information about how to get and run Shenandoah GC can be found in our Shenandoah wiki.

Advertisements

18 Responses to Introducing Shenandoah Traversal mode – part I

  1. Gregor says:

    I am testing ShenandoahGC for few week now and I find it very good for our application which needs to be as responsive as possible.

    I have tried traversal mode on our app. Application is quite intensive on memory and respawns few thousands of threads. I use 80GB heap on testing server. I have experienced crash of java immediately when I try to put it on heavy load.

    If I leave ShenandoahGCHeuristics set to default then all is fine, but I experience a hiccup in app for around 0.5ms in every few minutes, which is a no go for our application.

  2. Roman Kennke says:

    Gregor: sure, no problem, I am away myself, hanging out at the beach most of the time 😉 Cheers!

    • Gregor says:

      Hi Roman,

      I have put new logs on this share folder https://sync.fora.si/d/32a4805b03834ba7b4c5/
      I have used latest build openjdk-shenandoah-jdk11-b24-shenandoah-jdk-11+25-x86_64-release.tar.xz and traversal mode now works without crash, but I have found some other issues which I hope can be resolved with some additional settings.

      What changed from last tests is that now we have around 60% (before it was around 10%) of objects live for longer time (3 days). It looks like this now gives me an issue of consequent GC all the time.

      Regarding generic hiccup we are seeing. It happens with all GC types not just with Shenandoah. Can you explain me in details what you are searching for with monitor usage. We use a lot of threads in our app (between 5 to 10 thousand) and yes we are using it preventing simultaneous access to some objects from these threads.

      • Roman Kennke says:

        Thank you Gregor.
        Would it be possible to take the discussion to shenandoah-dev (http://mail.openjdk.java.net/mailman/listinfo/shenandoah-dev) ? The wordpress comment system is very crude for discussions, plus we’d have a chance that others may have useful input and/or find the information useful.

        We’ll check the heuristics to for the increased liveness issue.

        I was getting to monitor issue because the crash stacktrace indicated you’ve run into monitor deflation in the first cycle, something which often never happens at all, even throughout hundreds of cycles. The problem with monitors is, if they are contended, they will get inflated. At some point, they will need to get deflated, and this happens at a safepoint and takes some time. If you use many of them, you’ll notice a regular hiccup because of this. There is work in progress to make monitor deflation concurrent, but that’s not happening in jdk12 I guess. That being said, extreme monitor churn is an indication that something may be wrong. Objects used for synchronization are not usually allocated and freed all the time, and they should not really be so many to begin with. I cannot tell in your case without having looked at the code though. Watch out for objects that are used in synchronized() statements and look at reducing their count and allocation rate.

      • Roman Kennke says:

        You may want to experiment with changing the MonitorUsedDeflationThreshold (default to 90, accepts percentage between 0 (off) up to 100. You may also want to change GuaranteedSafepointInterval i.e. number of milliseconds to perform a guaranteed safepoint and check for monitors to deflate. Defaults to 1000 (i.e. every 1s).

      • > What changed from last tests is that now we have around 60% (before it was around 10%) of objects live for longer time (3 days). It looks like this now gives me an issue of consequent GC all the time.

        I think this is heuristics’ runaway. It locks free threshold at 70% (“GC(3) Adjusting free threshold to: 70% (46547M)”) and then no cycle is able to free up that much — it seems you have around 70% live heap (which is 30% free) after most cycles — so free threshold stays the same, and the next concurrent cycle triggers immediately. Meanwhile, you may want to cap the free threshold manually with e.g. -XX:ShenandoahMaxFreeThreshold=20. That is, make the free threshold below live set, but above the safety margin for allocations.

        We had the similar bug report recently, and there is work in progress to mitigate this. This free threshold mechanics does not look very reliable, and we consider better ways to control GC triggers.

      • Current {dev, 11, 10, 8} builds have better “adaptive” / “traversal” heuristics that should be immune to this runway. Gregor, please try again?

  3. shootingstar says:

    Just wondering, where did the project name Shenandoah come from? is it a place, person or thing? 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: