May 6, 2007 Leave a comment
When you get a segfault in a function, all input arguments look ok, the state of the program looks ok, and it’s probably even working in certain circumstances, just not in _this_ one (for example, different libraries, different drivers or different Java VMs). That is, the segfault is completely mysterious. It might just be the case that the calling stack has an overflow. This can be caused by really deeply nesting function calls or passing large datastructures, or simply by a stack that is too small. Another indication of that problem is when gdb points you to a source line that has a function signature.
You can probably solve the problem by making sure that you don’t pass large datastructures around in function calls (only pointers to them!) or – if that is not the case – by enlarging the stack. This can be achieved either by setting some flag in the compiler or linker (in ld, it’s –stack IIRC), or in the program when creating the thread.
My case was especially annoying, the program was segfaulting when using the proprietary nvidia driver, but was running well with X.org’s ati and intel chipset drivers. I could trace the problem exactly to the segfaulting OpenGL call (glXCreateContext), the ingoing parameters all looked OK and I just couldn’t figure out what went wrong. Unfortunately I couldn’t even ‘look deeper’ in the stack, because the nvidia driver has no source and no debug information. It was a different program (that suffered the same problem) that brought me to the stack size thing. Apparently, the nvidia driver does call deeper into the stack and/or passes some bigger datastructures around. (And yes, the program was compiled with a smaller default stack size that usual).