Happy interpreter hacking

The last couple of weeks I got my hands dirty (really dirty) with VM hacking, namely the Java interpreter of the JamaicaVM. I’m applying a couple of well known optimization techniques (besides loads of cleanup). The most promising so far (I’ve done only prototype-like implementations so far) are:

  • Superinstructions. Combines frequent ‘cheap’ instruction sequences into bigger units. For instance, in order to add two values, a common sequence is iload* iload* add istore*. Of course, there are variations (like only one load or without store and many more). Such a sequence would be combined into a new opcode which would do all this without accessing the operand stack. The net effect is that the overhead of moving stuff onto and from the stack is saved as well as the dispatch overhead between these instructions (which is quite significant as it is at least as expensive as the operations themselves in these cheap cases). Theoretically it would be possible to eliminate operand stack completely (this is called stack folding and implemented in one of the java processors iirc — quite amazing stuff), but this would unnecessarily bloat up the binary. Instead I’m doing some statistics to find the most frequent patterns and implement these.
  • Threaded dispatching. So far the interpreter has been implemented in a straightforward while-switch fashion. This creates a certain overhead, because we have to jump into the bytecodehandler, out of the handler and back to the loop condition. That’s pretty bad, especially considering that many opcodes are really cheap, and it tends to intercept the processor pipeline badly. The better way to do this is to jump directly to the next bytecode handler at the end of the previous one. That’s called threaded dispatching. Unfortunately, you can’t really implement this with ANSI C. So I did quite some macro magic and utilized a GNU C specific feature (label pointers). The outcome was quite impressive, ~15% improved performance in one go. After having gone through all this I was told by Ian Rogers that GCC has a -funswitch-loop optimization flag which makes a threaded dispatcher out of a while-switch dispatcher. Gotta compare it to my homegrown implementation. I really hope that mine does better than GCC’s one. I don’t like wasted weekends very much 😉

I’d like to implement some more intersting stuff, like direct dispatching or a primitive JIT, but that’d require to be able to get hold of some contiguous memory, which isn’t exactly easy in Jamaica (due to the hard realtime nature of the GC). Let’s see how this works out in the end.
On a related note, I’ve been digging into the Hotspot and JikesRVM sourcecode lately, mostly out of curiosity (until now). I’m quite excited by the idea of a VM implemented mostly in Java itself, even hosting itself. It requires some magic, but in my experience, beeing able to do things in Java without needing to touch C makes development more rapid and tends to produce better and more readable code. And if you think a VM implemented in Java itself must be slow, think again.

About Roman Kennke
JVM Hacker, Principal Software Engineer at Red Hat's OpenJDK team, Shenandoah GC project lead, Java Champion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: