May 4, 2007 Leave a comment
I almost finished the first step in my attempt to make Escher more efficient. This was by far the biggest part I supposed. I had to restructure almost all of the Escher classes to use the new protocol implementation. What I did was this: The old protocol implementation created a new Request object for each little request (unless, when multiple requests, like many lines one after the other, get collapsed), each with its own buffer. This buffer then got copied to the socket. You can imagine that creating and disposing buffers while doing heavy rendering (like many simple things like rectangles and lines) put quite a heavy load on the GC. The new implementation uses only one buffer (ok, two; one for input and one for output), and the protocol implementation writes into this one buffer. There is no (de-)allocation at all now. However, after the core of Escher has been adjusted this way, I needed to go through all classes and adjust every bit of Escher to this new implementation. This is now finished and SVN trunk now completely compiles again (it didn’t for a couple of months).
The next step is to shake out any bugs that sneaked in with this rewrite. After this, I might release a new version of Escher.
Things I am pondering to implement for further performance improvement are:
- Remove synchronization. The idea is that programs that don’t do multithreading shouldn’t be punished by the synchronization. And, after all, applications usually can do this for themselves much more effectively (for example, locking based on ‘frames’ rather than for each graphics primitive). I am not sure if I should ditch the synchronization altogether, or if I should put in some lock() and unlock() methods, which are no-ops when no locking is required, or which do locking via java.util.concurrent.locks, if an application chooses to do so. But my guess is that this not worth the effort or even couterproductive on non-optimizing VMs, a method call usually beeing more expensive than a monitor enter/exit.
- Implement the communication using NIO Channels and Buffers. This way, the above described buffer would be a direct ByteBuffer and this could be sent directly to the X server, without the hidden copy that is usually made by the SocketOutputStream. This would require to extend the LocalSocket implementation in classpath to implement SocketChannel. I think this shouldn’t be too hard. Communicating over a local socket channel to the X server is almost or exactly like using shared memory area. (My guess is that the buffer is passed through the kernel directly to the receiving process to read from, but I might be wrong here). Either way, this should certainly maximize rendering throughput for OpenGL.