dlsym
I used to think that the subtle bits of the POSIX API dlsym were the lookup ordering rules, and the interactions with shared libraries. That’s still true, but I’ve found a new thing to be thumped by.
For a very quick refresher for you readers, dlsym lets you lookup a symbol name in your load image symbol tables. It returns the address of the symbol code/data if it finds the symbol, and NULL otherwise. Here’s a link: http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
I thought about trying to describe this chronologically, as I encountered it, but I decided I was just trying to get people to share the pain, so I changed my mind. I think there are basically three relevant chunks or categories of information to convey here. Let’s start with the OS X implementation of dlsym.
In the Mac OS X ABI Dynamic Loader Reference there is a sentence in the dlsym doc: "Unlike in the NS... functions, the symbol parameter doesn’t require a leading underscore to be part of the symbol name." Hunh.
The man page for dlsym is more strongly worded: "Unlike other dyld API’s, the symbol name passed to dlsym() must NOT be prepended with an under-score."
And just so you know how they really feel about it, here’s the dlsym source (dyldAPIs.cpp), severely edited to bring it straight to your door in technicolor:
void* dlsym(void* handle, const char* symbolName) {... // dlsym() assumes symbolName passed in is same as in C source code
// dyld assumes all symbol names have an underscore prefix char underscoredName[strlen(symbolName)+2]; underscoredName[0] = '_'; strcpy(&underscoredName[1], symbolName);
OK, now we should circle back and look at gcc and C code, and a fine point of symbol tables. Consider the following little POSIX ‘compliant’ program:
#include <stdio.h>
#include <dlfcn.h>
void foo() {}
int main(void) {
void *p = dlsym(RTLD_DEFAULT, "foo");
printf("%p\n", p);
return 0;
}
Now if you compile and run this on linux and OS X with gcc, both versions will print out an address for ‘foo’. But take a look at the symbol tables. Do this on OS X:
gcc dlsymtest.c; nm a.out | grep foo
You can do the same thing on linux. On OS X, you’ll get this:
00001faa T _foo
On linux, you’ll get a different address, and the symbol will not be prepended with an underscore. Now, I always interpreted dlsym as being dependent on the literal names in the symbol tables, not the literal names in the source code. This is particularly necessary if you are used to the world of C++ code, where the literal names in the source code have pretty much nothing to do with the literal names in the symbol table because of name mangling. You have to have some situational awareness (non-portable) when you are using dlsym. The OS X implementors, however, have given us something new here. On OS X, dlsym cannot be used to lookup a name unless that name starts with an underscore in the symbol table. I can speculate as to why they chose to do this. I can see a reading of the POSIX spec examples that implies that it should work this way. Personally, however, I think it was misguided. They may have enabled portability for some bits of code (which I don’t think should have been considered entirely portable in the first place), but they added a DWIM (Do What I Mean) layer onto what is, to my mind, a very raw API, adding a pointless requirement to names in symbol tables: they really have to start with an underscore.
Now, I should be clear - there are alternatives to dlsym, and they are documented in the same ABI document that I cited above (the NS* APIs). However, the same document discourages their use, and explicitly directs you to the dl* APIs. The NS* APIs don’t do this dlsym reinterpretation. Something else interesting: the dynamic loader doesn’t use the dlsym logic when binding shared libraries. I know this because when I first wired up the Pascal linker to target OS X, I used existing external function declarations, which didn’t put an underscore on the imported names from libc. Thus I had literal names like ‘malloc’ in my symbol tables. The loader complained. I changed them to ‘_malloc’, and the binding succeeded.
From my point of view, the POSIX dlfcn.h APIs should be interpreted to specify the names of the APIs and types involved, and describe the lookup ordering semantics, but should leave out any interpretation of the literal strings fed into the APIs for names. That’s why I wrapped the word ‘compliant’ in quotes when I described my little C program above. It’s the API type and semantic issues that are the really nasty issue with respect to portability, not the names. Using ifdefs to deal with literal name changes from one compiler to the next is trivial compared to dealing with yet another custom dynamic name lookup API from one platform to the next. Maybe someone can give me a strong rationale for why the OS X implementors did the right thing. I think they goofed.
Share This | Email this page to a friend
Posted by Eli Boling on June 15th, 2009 under C_Builder, Delphi |

RSS Feed

June 16th, 2009 at 2:22 am
Does this post mean moving Delphi/C++Builder to cross-platform as this article describe:
Ex-Borland’s Delphi owner re-ignites cross-platform dream
http://www.theregister.co.uk/2009/06/12/embarcadero_codegear_tools_future/
June 16th, 2009 at 4:08 am
@yoyochen: yes.
June 17th, 2009 at 12:26 am
I’m no machine code genius, but it sounds like a lot of work being done "under the hood".
My(unfounded?) impression here is that you’ll need some more systems engeneers to get something useful accomplished before the next glacial era
not because of you, but since - as I said - it sounds like an enormous amount of work
Andrew
July 16th, 2009 at 9:06 pm
Eli,
I am so Happy to hear that You returned!
Yours,
Daniel