Threading in ANSI-C

Thanks to Kannan Goundan I think I now found a solution to the threading-in-ANSI-C problem that I had. Basically this approach implements a tail recursion and ‘hopes’ that the compiler detects and resolves this (avoiding a stack overflow). You better check carefully if that is the case on your particular compiler though. Check out the 1st comment on the above linked posting for a prototype. Below you find some snippets of the generated assembly code of GCC and Microsoft’s compiler (no, I won’t spam the planets with that 😉 ).

First, let’s have a look at what GCC generates for the Add() function in Kannan’s example:

pushl %ebp
// the add operation..
movl %esp, %ebp
movl 8(%ebp), %edx
movl (%edx), %eax
addl %eax, acc
// the indirect jump
leal 8(%edx), %eax
movl %eax, 8(%ebp)
movl 4(%edx), %ecx
popl %ebp
jmp *%ecx

This is not bad. However, there are two superfluous operations in here (at least I can’t find any sense in them) and that is the pushl %ebp and popl %ebp. It’s still much better than a switch based dispatcher though. Any recommendations on why these pushs and pops are needed there and how to eliminate them are welcome of course.

Now let’s look what MS’s compiler does with that function:

; File c:\users\proetel\vs2005_workspace\threading\threading\threading.c
; Line 14
mov eax, DWORD PTR _pc$[esp-4]
mov ecx, DWORD PTR [eax]
add DWORD PTR _acc, ecx
add eax, 4
; Line 15
lea edx, DWORD PTR [eax+4]
mov eax, DWORD PTR [eax]
mov DWORD PTR _pc$[esp-4], edx
jmp eax

Hmm. Not bad, eh? No pushs and pops at least. The remaining bits look more or less identical and can’t really be optimized much further (AFAICS).

Update: Here is the assembly from the same compiler (MSCC), but for Arm:

|Add| PROC
; File c:\users\proetel\vs2005_workspace\threading\threading\threading.c
; Line 13
stmdb sp!, {r4, lr}
; Line 14
ldr r4, [pc, #0x20]
ldr r1, [r0], #4
; Line 15
ldr r3, [r4]
ldr r2, [r0]
add r3, r3, r1
str r3, [r4]
add r0, r0, #4
mov lr, pc
mov pc, r2
; Line 16
ldmia sp!, {r4, pc}
DCD |acc|

ENDP ; |Add|

Interestingly, this contains the function boilerplate (ldmia and stmdb). So it is comparable to the GCC output above. Also interesting is that the ldmia is after the mov pc,r2 (jmp in arm lingo), so it can never be reached. Hmmmmm (Thanks Twisti for helping me out with Arm assembly syntax).

I still need to test this bit on OS9. My guess is that this compiler will also do pretty well on that. Needs proof though…


2 Responses to Threading in ANSI-C

  1. Robert Lougher says:

    Hi Roman,

    Of course, in the MSCC ARM example, as the ldmia is never reached you’ll quickly blow your stack…


  2. roman says:

    Oh right. Should I say ‘duh’! Or even DUH! 😉

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: