Mac OS X uses the Mach-O format for binaries. The spec for this is well published. There’s a brief summary about it on Wikipedia: http://en.wikipedia.org/wiki/Mach-O. Mostly I have no objections to the file format. You always have to read over the specs carefully to discover all the gotchas. There’s one bit that caused me some headaches: by convention, the header information appears in the first physical segment in the image. This is typically the code segment, and so the header info ends up offsetting the code a bit. The header info side depends on lots of things about your link, so it’s hard to determine it’s size up front. You end up with a bit of a chicken and egg problem if you want to conform. That’s no big deal, really; I just made a point of conforming because there’s no point in looking for trouble with the loader. You always deal with two specs in reality. There’s what’s printed on paper, and there’s what the loader does. Anyway, that’s not really what this post is about. It’s about something that irritated me a little.
The Mach-O format looks like it supports lots and lots of sections. Early adopters (not Mac OS X) used a limited number of sections (e.g. .text, .data and .bss, and that’s all). That was too limiting, so someone made a change to the Mach-O format for Apple. They changed the symbol table entries so that they could include a section index, so you could have the data for a given symbol appear in an arbitrary section. That’s good, because it’s really not enough to just have .text, .data and .bss for your image sections nowadays. Now the bit that bugs me is that they just used an unused field in the symbol structure (struct nlist, in nlist.h), and that field is only a byte. So you can have lots and lots of sections, provided that you don’t have any more than 255. The nlist structure is the only structure that causes this limitation. Now for executable images, it’s really unlikely that in a 32-bit flat image you’ll ever be wanting even close to that number, but the same file format is used for object files, and this raises a little issue for C++. C++ compilers emit tons of duplicate function definitions, and different platforms have different ways of dealing with this. Now, in ELF, the GNU tools emit one section per function. These sections have one symbol pointing at each section, and everything is marked WEAK, so things are only linked in if they are referenced. This is bulky, but really quite nice in other respects, because it’s easy to segregate the data. Because of the choice of a byte field for the section index in the symbol structure, however, this is not viable for C++ object file output on Mac OS X with Mach-O. Indeed, if you look at the output of g++, you will see all the duplicate functions being tossed into one section, leaving the linker to tease the bits apart.
Either way would be fine with me, it’s just that I wish that it were the same paradigm for both binary platforms, so that the various bits of code in the tools could all be very generic. I don’t like the choice being forced because someone was parsimonious with the symbol table layout. Spilt milk.Posted by Eli Boling on May 26th, 2009 under C_Builder, Delphi, OS X |