Efficient JNI programming IV: Wrapping native data objects

Everytime I write an article about JNI, there’s some more stuff that comes to mind. This time about wrapping native data structures in a Java object. This is something very basic which almost every JNI program has to deal with, so I’ll have a look at a couple of approaches.

Casting pointers to long

The most straightforward, easy and probably most common approach to this problem is to simply cast the pointer to the data object to a jint and store it into a field of the wrapper object. In order to stay compatible with 64 bit pointers on newer systems, you really want to cast to a jlong instead. This is certainly the approach with the least overhead and least storage demands. But it has some (slight) disadvantages that you might consider. Let’s look at

Wrapping pointers in special objects

Some people would argue that exposing a pointer as long field (even if it’s private) is a little dangerous, because this means that the actual address could be accidentally changed from the Java side. This is why we have a bunch of special classes plus helper code in GNU Classpath. There, the pointer is stored in an instance of gnu.classpath.Pointer, an abstract class with two concrete subclasses, gnu.classpath.Pointer32 and gnu.classpath.Pointer64, each of which implements a 32 bit and 64 bit pointer respectively. This is accompanied by a couple of native helper functions to allow easy access to the actual pointer. This way it is possible to store a native pointer in an opaque way on the Java side. The tradeoff is a slightly higher memory demand (+1 Java object) and slightly more overhead to access the pointer. But it doesn’t hurt that much either (all are small O(1) operations).

Direct ByteBuffer

This is a cool trick that I learned from JOGL. From JDK4 onwards you can quickly forget the last paragraph, because you don’t need to add any new classes or helper functions, because it’s all right there in the JDK. It might not be obvious at first glance, but the direct ByteBuffer serves the exact same purpose, and does even more. Maybe the name ByteBuffer is a little misleading for this use case, but alas, it’s basically a native pointer and some bookkeeping information. There’s even the required helper methods in JNI, NewDirectByteBuffer(), GetDirectBufferAddress() and GetDirectBufferCapacity(). Suppose you have a native datastructure of type MyNativeStruct and want to wrap it as a direct ByteBuffer, you’d do:

MyNativeStruct* data; // Initialized elsewhere.
jobject bb = (*env)->NewDirectByteBuffer(env, (void*) data, sizeof(MyNativeStruct));

Later when you need to access the pointer, you can do this:

jobject bb; // Initialized elsewhere.
MyNativeStruct* data = (MyNativeStruct*) (*env)->GetDirectBufferAddress(env, bb);

Easy, isn’t it? But it’s even better. Using this approach it is possible to actually access the native data structure from Java. You need to know the data layout of the native data and can then access it from Java using the ByteBuffer methods.

Suppose you have such a data structure:

struct {
  int exampleInt;
  short exampleShort;
} MyNativeStruct;

You could provide accessor methods on the java side like in the following example.

public int getExampleInt() {
  return bb.getInt(0);
}

public short getExampleShort() {
  return bb.getShort(4);
}

The ByteBuffer code would even sort out endianess problems for you. Neat, isn’t it?

Also check out part I, part II and part III of this series too.

Advertisements

US weapons in the middle east?

Smells like a bad deal. Isn’t the middle east explosive enough? Or is it just that some US polititians want to make some more money via their favorite weapon manufacturer? Corrupt bastards…

In other news, the NATO proposes smaller bombs for the afghani to get more support from the people. That almost sounds like a joke. ‘Hey see, we are your friendly NATO bombers. We throw smaller bombs on your cities now.’

On a similar note. I remember a german polition (don’t remember who it was) saying, what the terrorists really fear aren’t more soldiers, bombs and weapons (in fact, they welcome all this, because it feeds their propaganda quite well), but teachers and schools. There’s nothing I can add to this.

Efficient JNI programming III: Array access

The last two parts of this small series dealt with considerations when to implement stuff in a native JNI method and how to best implement field and method access in JNI. Now I will take a close look at another common cause of headaches, array access in JNI.

Let me start with a small example of how array access is usually implemented in JNI:

JNIEXPORT void JNICALL Java_JNIStuff_myNativeMethod(JNIEnv* env, jintArray myArray) {
jint* buf;
(*env)->GetIntArrayElements(env, myArray, NULL);
// Do something with buf.
}

Ok, what does this function do and what could be wrong with it? Basically, it takes a Java int array and creates a C jint array of it. If you are really lucky, and the datastructure for Java int arrays in the VM is the same as a C jint array, and the Java VM supports pinning of arrays, this is just fine. However, if not all of the above is true, then the VM will copy the Java array contents into an equivalent jint*. And we wouldn’t know this, because we supplied NULL as last argument. We could have supplied a jboolean*, into which the VM could have put a notice if the VM has made a copy of the array. So let me explain what could lead to array copying:

  • The VM can’t ‘pin’ the array. Many VMs can move around Java arrays, for example when compacting the heap. When a JNI method grabs such an array, the array needs to be pinned, so that the garbage collector can’t move it around in the meantime. This usually hinders or even halts the GC in its work, so this is not a desirable state for a longer time. In an effort to prevent GC blocking, a VM can decide to copy the array.
  • The VM doesn’t store Java arrays as contiguous C arrays. An example are realtime VMs that need to do this in order to guarantee realtime behaviour.

So, depending on the VM and the actual implementation of the rest of the method,the above code snipped might be perfectly OK to go with or it can prove to be a serious performance bottleneck. Imagine you pass a large array to the native method and only read a couple of bytes out of it, and need to copy the whole array for this.

So, let me discuss the different way to handle arrays in JNI.

Get<XYZ>ArrayElements()

This is what is used in the above example. This may or may not copy the Java array, but if in doubt, it more likely will make a copy. Use this when you do longer processing of a Java array on the native side. Don’t forget to Release<XYZ>ArrayElements to release the array.

Get<XYZ>ArrayRegion()

This always makes a copy, but only of a region of the Java array. When you have a large Java array and only need a small portion of it, then use this. If you make modifications, you can use Set<XYZ>ArrayRegion().

GetPrimitiveArrayCritical()

This has been introduced in JDK1.3 and is sematically identical to Get<XYZ>ArrayElements, but the VM would try really hard to not make a copy. It would give you the real thing even if it would block the GC. For this reason, use this function only when you can release the array quickly. As with Get<XYZ>ArrayElements, don’t ever forget to ReleasePrimitiveArrayCritical().

Direct ByteBuffer

As of JDK1.4, you can use NIO direct ByteBuffer to transfer bulk data to native code. Direct byte buffers reference a native array directly, thus accessing this on the JNI side doesn’t cause much headache. Use GetDirectBufferAddress() to get a pointer to the memory region, and GetDirectBufferCapacity() to get the length of the buffer in bytes. You can access this region like you would access any other C array without the need to release the array and without the need to think about possible copies. The tradeoff is that ByteBuffers are usually slower to access from the Java side, because this involves a bunch of method calls and bookkeeping. Use direct ByteBuffer when you access the data mainly on the JNI side, and little to never on the Java side. You should also consider that direct byte buffers live outside the Java VM, so it is a little hard to measure (e.g. for profiling tools) how much memory a direct ByteBuffer actually uses.

Efficient JNI programming II: Field and method access

In my last installment I discussed some topics that help you decide if you need to implement a native method at all, and if yes, then what there is to consider. This time I want to talk about one of the most common pitfalls in JNI programming, field and method access.

Suppose you need to access a field in a Java object. The straightforward naive approach to this would look like:

JNIEXPORT void JNICALL Java_JNIStuff_myNativeMethod(JNIEnv* env, jobject this) {
  jclass objClass = (*env)->FindClass(env, obj);
  jfieldID myFieldID = (*env)->GetFieldID(env, objClass, "myField", "I");
  jint fieldVal = (*env)->GetIntField(env, obj, myFieldID);
  printf("myField has value: %d\n", fieldVal);
}

This is not only overly verbose for something simple as getting a field (other languages do this like obj->myField), it also has a glaring problem: The GetFieldID() function is quite a performance hole. In the worst case it has to process the strings, and lookup the fieldID in the corresponding class data. And even in the normal case (when the fieldID might already be stored somewhere) it needs to process the strings and do some hash lookup.

So the general advice you often hear is to not do field access (and method access alike) at all in JNI. You might ask, how is that possible to do? You could simply avoid dealing with objects in JNI at all, and pass all the primitive data you need to the native method, and do the field access on the Java side. That’s actually not the worst idea, but there’s also other approaches which I’d like to discuss.

When GetFieldID() is so slow, it probably makes you wonder why it has been included in JNI at all. A native interface which is inherently slow doesn’t make much sense, does it? Well, the answer is, GetFieldID() simply should never be used like shown above. The idea is that native code should always cache its field IDs and method IDs. A common approach for this would look like this:

jfieldID myFieldID = NULL;

JNIEXPORT void JNICALL Java_JNIStuff_myNativeMethod(JNIEnv* env, jobject this) {
  jint fieldVal;
  jclass objClass;

  if (myFieldID == NULL) {
    objClass = (*env)->FindClass(env, obj);
    myFieldID = (*env)->GetFieldID(env, objClass, "myField", "I");
  }

  fieldVal = (*env)->GetIntField(env, obj, myFieldID);
  printf("myField has value: %d\n", fieldVal);

}

However, this approach has two flaws. First and less important, it has to perform the if-check on each execution. This is probably a minor performance hit compared to GetFieldID(), but still. The other more important problem is the lifetime of the field ID. According to the specification, the field ID is guaranteed to be valid only as long as the class is loaded. So, if for some reason the class gets unloaded, and then loaded again, a former field ID is probably invalid. You can work around that by storing the jclass reference as a global reference somewhere, thus preventing the GC from unloading the class.

The clean and correct approach makes use of the lifetime of the field (or method) ID. The idea is to always cache the IDs when the class gets loaded. In order to do this, we add a native method initIDs(), which fetches the IDs that we need like this:

jfieldID myFieldID;

JNIEXPORT void JNICALL Java_JNIStuff_initIDs(JNIEnv* env, jclass cls) {
  myFieldID = (*env)->GetFieldID(env, cls, "myField", "I");
}

JNIEXPORT void JNICALL Java_JNIStuff_myNativeMethod(JNIEnv* env, jobject this) {
  jint fieldVal = (*env)->GetIntField(env, obj, myFieldID);
  printf("myField has value: %d\n", fieldVal);
}

Looks better, eh? The good solution is always the nicest looking 😉 Now, when the class gets unloaded and re-loaded, the IDs get initialized again, and you wouldn’t shoot yourself in the foot with class unloading.

However, sometimes it’s not that easy. What if you need to access a field that’s in another class, to which you don’t have access to the sourcecode to add a class initializer? I think, in such a case it’s time to re-think your design. In most such cases it’s possible to refactor things so that you only need to access fields or methods in your class, possibly by wrapping the ‘alien’ class somehow. If you need to access a non-visible field in that alien class, my answer is simply DON’T DO THIS.

Another important aspect of the last solution is, you always need to initialize the IDs for one class in the initializer of that class, and not in the initializer of another class. Otherwise you subvert the whole point of avoiding class unloading badness.

Summary: Initialize all your field IDs in the class initializer of the class to which the fields belong. This whole discussion applies in the very same way to method IDs.

In the next installation I’ll discuss various ways to transfer and access array data from Java to native and back.

Efficient JNI programming, part I

I often hear the complaint about JNI, that it’s clunky, hard to use, that it has certain overhead, and that it makes programming of native methods for Java hard and slow. Well, I agree that it’s somewhat clunky and hard to use, and the learning curve is probably steep (and the effect of not understanding how JNI and a Java VM works is usually bad performance). This is why I want to write a small series of articles which explains a couple of rules and paradigms which will allow you to get most out of JNI programming.

In this first part I want to discuss, when and why to use JNI, and when not. Of the top off my head I can think of the following reasons to implement a method in a native language (usually C or C++), rather than in Java:

  • To wrap an existing native library or syscall and provide a Java ‘frontend’ to it.
  • To access native resources (memory, etc) that wouldn’t be possible from Java.
  • To improve performance of an algorithm.

The first two are completely valid reasons to me. However, the last one is the one you should think about. Interestingly enough, this is also the one that I most often hear like ‘couldn’t we improve performance of this routine when we implement that natively?’. The answer of this questions depends on a couple of factors:

  • On the algorithm. Usually I would argue that it doesn’t matter if you implement your algorithm in Java or native code. The most popular VMs are pretty darn good in optimizing your code so that runs at almost native speed. Implementing your algorithm in C usually squeezes a couple of cycles, but it doesn’t magically improve your algorithm. It’s usually a much better approach to analyse your algorithm and see how it can be improved for itself. There’s one notable example to this rule. Algorithms that are heavy on memory access, especially large array mangling (like image processing) are usually significantly slower in Java, because the VM performs implicit bounds checking. I am not sure, if and how JITs like Hotspot optimize that (Any hints on that are very welcome!). I’d think that there’s still a significant performance improvement in native code here.
  • On the used VM. A good optimizing JIT will usually optimize the hell out of your Java code and there’s a good chance that you will only see marginal performance improvements when you implement stuff in native code. Even worse, implementing methods in JNI can even decrease performance, because a call to a native method can be a hard barrier to the optimization of a JIT. However, if you’re targeting a pure interpreter, and you are sure that your program will only be run on that, then it might be worth implementing a critical algorithm in C instead of Java.

You should also consider that implementing and improving and algorithm in C usually takes much longer and has more pitfalls to avoid. You really only should do this as a last resort and when you’re pretty sure that your algorithm is optimal already.

These hints should give you a feeling _if_ you want to implement a method natively. Let’s assume you decide for yes, then you should make up your mind about _what_ to implement natively? A complete algorithm? Or only the absolute minimum with most of the algorithm in Java? Half-half? The answer here is, when you really need to implement an algorithm natively, then you should implement it completely in native. Avoid jumping around between Java and JNI land too often. For example, you shouldn’t implement a loop in Java, and call some native method inside that loop. There are two reasons for this: 1. As said above, calling a native method can prevent a JIT from optimizing code around it. 2. Calling a JNI method can come with some calling overhead (although good VMs are pretty good at optimizing that).

Feel free to add your comments, I’m pretty sure there’s more to say about that.

In the next installment I will discuss the dreaded fieldIDs methodIDs.

Escher peers progress

Yesterday I did the antialiasing, which is needed especially for TrueType font rendering to look nice. Today I fought a little with window manager hints. I found it a little surprising that there’s apparently no standard way to open undecorated windows. Most toolkits seem to rely on Motif proprietary hints. There’s the other two options to make a Window ‘override-redirect’, which creates a Window without decorations, but also one that sticks in front of all other windows, or you can use one of the logical hints in the Extended Window Manager Hints spec, in this case I declare the window as TOOLBAR, which isn’t semantically correct, but does the job, at least under GNOME’s window manager.

Anyway, all this makes Swing apps look much better now (except for icons):

And it’s even quite snappy, considering that almost everything is done in Java.

BTW, I consider to attend the X developer summit 2007, and probably do a presentation about Escher. Not sure if I can make it though.

Colored Anti-Aliasing

Today I hacked up the Escher peers some more, this time I implemented the actual pixelization routine for the antialiasing rasterizer. Before I only converted the coverage information to a grayscale between black and white, which of course looked stupid on colored background. Now it is doing the correct compositing of the foreground and background pixels. This is not exactly nice to implement on X, because (plain) X itself doesn’t support transparent colors or such thing. This means that for every non-fully-opaque or non-fully-translucent pixel I have to fetch the current surface pixel and do the math myself. The outcome can be seen below. Interesting to note is that the linear RGB scale doesn’t seem to do the job perfectly when interpolating linearily. The cyan foreground on red background seem so produce a slightly too dark transition. But for 99.99% of the usual use cases this approach is more than good enough.

Colored Anti-Aliasing

Another feature I added to the Escher peers is configurable font mappings through a properties file. This could be improved by adding support for fontconfig stuff. I have to think how to best implement it. IIRC, fontconfig works via a bunch of XML configuration files. Maybe I’ll add a small Java-only fontconfig parser and library and get my font info from there (as an alternative to the fonts.properties approach).