Efficient JNI programming, part I

I often hear the complaint about JNI, that it’s clunky, hard to use, that it has certain overhead, and that it makes programming of native methods for Java hard and slow. Well, I agree that it’s somewhat clunky and hard to use, and the learning curve is probably steep (and the effect of not understanding how JNI and a Java VM works is usually bad performance). This is why I want to write a small series of articles which explains a couple of rules and paradigms which will allow you to get most out of JNI programming.

In this first part I want to discuss, when and why to use JNI, and when not. Of the top off my head I can think of the following reasons to implement a method in a native language (usually C or C++), rather than in Java:

  • To wrap an existing native library or syscall and provide a Java ‘frontend’ to it.
  • To access native resources (memory, etc) that wouldn’t be possible from Java.
  • To improve performance of an algorithm.

The first two are completely valid reasons to me. However, the last one is the one you should think about. Interestingly enough, this is also the one that I most often hear like ‘couldn’t we improve performance of this routine when we implement that natively?’. The answer of this questions depends on a couple of factors:

  • On the algorithm. Usually I would argue that it doesn’t matter if you implement your algorithm in Java or native code. The most popular VMs are pretty darn good in optimizing your code so that runs at almost native speed. Implementing your algorithm in C usually squeezes a couple of cycles, but it doesn’t magically improve your algorithm. It’s usually a much better approach to analyse your algorithm and see how it can be improved for itself. There’s one notable example to this rule. Algorithms that are heavy on memory access, especially large array mangling (like image processing) are usually significantly slower in Java, because the VM performs implicit bounds checking. I am not sure, if and how JITs like Hotspot optimize that (Any hints on that are very welcome!). I’d think that there’s still a significant performance improvement in native code here.
  • On the used VM. A good optimizing JIT will usually optimize the hell out of your Java code and there’s a good chance that you will only see marginal performance improvements when you implement stuff in native code. Even worse, implementing methods in JNI can even decrease performance, because a call to a native method can be a hard barrier to the optimization of a JIT. However, if you’re targeting a pure interpreter, and you are sure that your program will only be run on that, then it might be worth implementing a critical algorithm in C instead of Java.

You should also consider that implementing and improving and algorithm in C usually takes much longer and has more pitfalls to avoid. You really only should do this as a last resort and when you’re pretty sure that your algorithm is optimal already.

These hints should give you a feeling _if_ you want to implement a method natively. Let’s assume you decide for yes, then you should make up your mind about _what_ to implement natively? A complete algorithm? Or only the absolute minimum with most of the algorithm in Java? Half-half? The answer here is, when you really need to implement an algorithm natively, then you should implement it completely in native. Avoid jumping around between Java and JNI land too often. For example, you shouldn’t implement a loop in Java, and call some native method inside that loop. There are two reasons for this: 1. As said above, calling a native method can prevent a JIT from optimizing code around it. 2. Calling a JNI method can come with some calling overhead (although good VMs are pretty good at optimizing that).

Feel free to add your comments, I’m pretty sure there’s more to say about that.

In the next installment I will discuss the dreaded fieldIDs methodIDs.

Advertisements

7 Responses to Efficient JNI programming, part I

  1. Wayne Meissner says:

    Andrew Cowie covered a bit of JNI performance stuff in his LCA2007 talk as well. Basically, it boils down to:

    1) Do as much work on the other side of the boundary per boundary crossing as possible. Like you said, if you’re going to do a loop that calls a native function, then do it on the native side rather than in java. This reduces the overhead of JNI calls.

    2) Whenever possible, pass primitives (int, long, etc) down to JNI rather than classes. JNI is optimised to handle these, plus it avoids calling back into java to pull out fields from a class.

    3) If you’re going to transfer bulk data from JNI to java, use direct ByteBuffers. Not all arches seem to support pinning of primitive arrays.

    For cases 1 + 2 (gaining access to existing libraries/native resources), use one of the JNI wrapper libraries like JNA or nlink – they allow you to avoid writing any JNI code at all.

  2. roman says:

    Wayne, thank you for your comments. I’d like to add my notes:

    re 2) I don’t think it’s as easy as that. I doubt that JNI is optimized specifically for this. With JNI you can pass objects just as easily to JNI code as primitives, and it makes no difference (runtime-performance-wise) if you access a field in Java or in JNI. However, it’s harder to do in JNI and easier to shoot in your foot (I’ll discuss that in part II).

    re 3) I don’t agree here either. All in all there are 3 or 4 different ways to implement bulk data transfer, each of them has their advantages and disadvantages. None of them is the one perfect solution for all cases (for example, accessing ByteBuffers in Java is relatively slow, compared to Java arrays). It depends a lot on the actual scenario which way to choose. I’ll discuss that in part III of this mini series.

  3. Wayne Meissner says:

    Well, you now have more fodder for a “performance myths and fallacies” section šŸ™‚

    Should be interesting if you benchmark the different use-cases, and the different ways of sending data (primitive vs object with JNI access, etc)

    Of course, I suspect that any differences are largely washed away due to the overhead of a JNI call.

  4. roman says:

    I wouldn’t over-emphasize the ‘overhead of a JNI call’. Granted, some VMs aren’t really smart with this. On the other hand, many VMs are really damn smart and have no overhead for JNI calls at all. A good JIT for example would inline all the calling stuff into the generated machine code and makes a JNI call exactly as fast as any native function call. Even the less optimized VMs that I know aren’t that bad in this regard. I think this is mostly a myth that stems from the other problems with JNI (field access, method access and array access).

  5. Pingback: This note’s for you » Blog Archive » Efficient JNI programming II: Field and method access

  6. Pingback: This note’s for you » Blog Archive » Efficient JNI programming III: Array access

  7. Pingback: This note’s for you » Blog Archive » Efficient JNI programming IV: Wrapping native data objects

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: