Watch, Follow, &
Connect with Us

Eli Boling


Minor Emacs Utility

I use Emacs quite a bit.  It’s very common for me to be looking large dumps of data that are tabular in nature.  Symbol table information, usually, but it could be lots of things.  Usually it’s the output of one or another command line utility for dumping some file format.  In any case, it’s also very common for me to get a few hundred lines down into some table, and be looking at an entry, and to forget which column is which.  And so I go back to the header, and look again, then back to the entry.  It’s pretty irritating.  So I whined about this, and someone told me about Emacs Header Lines (, and I was very pleased.  So I spit this little function into my .emacs file, and now I can just go to the header line in the file, and toggle my Emacs header line for that buffer, and scroll down to my hearts content, and there’s that nice header line right at the top of the buffer, occupying minimal space, and when I’m done, I toggle it, and it’s gone.  It’s just one of those little things that makes life easier to deal with.  Less pain, more gain.

Here’s the code:

(defun freeze-line-as-header ()
    (if header-line-format
        (setq header-line-format nil)
      (setq header-line-format (concat " " (buffer-substring (line-beginning-position)
(define-key global-map [(control ?!)] 'freeze-line-as-header)
Posted by Eli Boling on October 25th, 2012 under Uncategorized | Comment now »

Bad API Design

So I almost got burned by a function.  Partly my fault, because I didn’t read the online documentation on it.  I looked at the API in the header file, and it had very limited documentation, but mostly enough for me to figure out it.  I also happened to be looking at the implementation, and noticed that I was hanging onto a live wire. So the API looks like this, with the names changed to keep this general:

char *RenderSomeStringConversion(const char *in, char *out, size_t *out_size);

Now, the out_size parameter should have warned me a little, since it’s a pointer to the size. Maybe I should have looked further there.  Anyway, the API takes the input string, does translation on it, and dumps the result in the output buffer, and returns the output buffer.  C programmers are used to functions like this returning the output buffer, so that’s not a really good warning about what this thing can do to you.  OK, so the landmine this API sets out for you is that the function will attempt the transformation, and if it decides that it can’t fit the result in your output buffer, it will call realloc on your pointer.  IOW, there’s no way to use this method without involving yourself with malloc/free.  And worse, most of the time, if you’ve got a static buffer that’s a good size, this API will work just fine, but instead of failing when it isn’t big enough, it will just burn you if you run into the edge case.  And it will be so much more fun to debug if you were to, say, pass in the address of a local that’s too small, because then it will call realloc on a pointer into your stack, and off you’ll go someday, launched into some interesting area of memory to execute god knows what bits.

Now, in the online docs, this is actually well documented, but this is an API that’s part of a permanent API broadly available to people, and it’s just got this awful nasty sticky thing on it.  A little more time could have produced an API that didn’t have this dependency or trap.

Posted by Eli Boling on March 15th, 2012 under C_Builder | Comment now »

Well, _that_ was painful: #1

The title of posts like this is intended to indicate events that were pretty painful, and where the fault was all mine.

So, shortening up the damage chain, on IA-32, if you do this:

FSTP [some location]

FILD [some location]

and the FPU stack was empty at the FSTP, you’ll die on the FILD instruction.  The FSTP just sets some condition bits and fails silently.  So, in this case, the ‘…’, of course, was really a long long distance.  Specifically, it was in either Move or FillChar in the Pascal RTL, which use FILD for optimization for larger moves.

Either Move or FillChar?  Well, yeah, because I started hunting around with writeln to bracket things, and it moved things around.  Actually, it moved things around considerably, because I was bracketing front and back, and that sometimes moved around the primary cause of pain.

The primary cause: why did I do FSTP with an empty stack?  Well, I had this asm function, one of whose parameters was a Boolean, and I loaded it like this:


The caller, had loaded the boolean like this:

MOV AL, BYTE PTR [someplace]

So, depending on what sort of garbage was in EAX at the time in the caller, my boolean, which I tested with "OR EAX, EAX", was pretty variable.  And I decided to pop the FP stack based on it.

One thing that helped in a funny way was that as I added more writelns, it actually moved me closer and closer to the original fault, but only because it was sometimes calling the RTL Move function.  If I’d been smarter, I’d have been bracketing using FILD instructions, not writelns.  Or, I’d have been a little more careful writing the original ASM code that read the boolean.

Oooh, that one took me a while to find.

Posted by Eli Boling on March 5th, 2011 under Delphi, Uncategorized | 1 Comment »

Safari, Flash and _MEMORY_

This is a little bit of a dead horse beating, but it’s still relevant, so what the heck, why not.  RAM is extremely important to me as a developer for the obvious reasons, so every once in a while, I look around at my processes, and see who’s being an oinker.  I was a little surprised at the results.

First of all, a reminder about my dev system:  Macbook Pro, running VMWare 3, which is constantly running Windows 7 Ultimate (32bit just now).  I keep a number of Mac apps up pretty much all the time.  These include Adium, Skype, Mail, iCal and Safari.  Safari typically has just one window open, pointing at gmail.  Throughout the day, I use Safari a lot, researching technical topics, researching software, reading news.  I may have a lot of windows/tabs active during a given session, but I generally close them all down after a while, and go back to the one.  For quite a while, I was just using plain old Safari out of the box, no extensions, no nothing.

So, one day, I go looking at my machine, looking for oinkers (BTW, oinker is slang for pig, in this case referring to any process that is being a memory hog), and what do I find?  Not VMWare, which is running a whole OS under the covers.  No, I find Safari and Flash, at the number one and number two spots.  And they are WAY ahead.  Both are using around 600MB Real Mem, and a similar amount of Virtual Mem.  And this is with one gmail window open.  And that Flash Player process will just never die, so long as Safari is up and running.  Well, I found that to be more than offensive, and at that level of usage, it’s actually getting in my way, so I spent some time reading up.  Lots of people said pretty much the same thing, but there was some variability.  Some people really didn’t see the same problem at all.  Eventually, I found ClickToFlash, which is a Safari WebKit plugin that stops Flash controls from executing unless you click on them.

So I gave ClickToFlash a try.  The Flash process disappeared from the memory profile altogether, which wasn’t too surprising.  What was interesting was that when I invoked Flash on places like YouTube, and other sites that have Flash content that I want to see, the Flash process was well behaved, and put itself to bed after the Safari tab in question closed.  Safari memory usage still crept up, but it seemed like it wasn’t quite so quick.

So, then Lee pointed me to the ad blocker and script blockers that he uses for Safari:  AdBlock, and JavaScript Blacklist.  I’ve used ad blockers on and off, but I have sometimes had troubles with sites that won’t work properly with them, so I’ve generally just trained myself to ignore ads.  Sometimes my kids will see something in an ad on a page I’m looking at, and mention it, and I’ll ask "What?  Where?"  They’ll have to point, because I just don’t look at or process those regions of the page.  The one exception is when a site has one of those expand down ads.  I really can’t stand those, because they scroll the page contents down, and then back up, and you have to wear out your knuckle a bit more on the scroll wheel.  I still don’t mentally process the content.  Anyway, I installed both these extensions, to see what happens.

Well, I’ll say that AdBlock seems to work pretty nicely.  A lot of pages I regularly use got very quiet, visually. Safari’s memory usage still creeps up, but again, not as quickly.  I’m still using ClickToFlash at this point.

So, I decided to try a regression, to test out an hypothesis:  does Flash memory oinkage (ok, now we’re making slang up.  Roll with it.)  come about because there are sloppy or maladjusted or malevolent Flash applets in some of the ads?  So I kept AdBlocker enabled, and disabled ClickToFlash.  What do you know, the Flash process stayed pretty well behaved.  I do see it floating around more, and I’ll probably re-enable ClickToFlash, just because there are some sites that use Flash in irritating ways that are not ads, but are still not something I want to see.  In any case, it’s my thinking, now, that Flash isn’t entirely to blame for being a memory hog, but I’m still keeping it on a tight leash.

Now, back to Safari.  It’s still a piggy.  With the three extensions above enabled, the Real Memory usage still drifts up and up during an 8 hour period until it sits, generally, for me, at about 650MB.  It usually plateaus there.  There’s an additional 600MB of Virtual Memory in use.  This past weekend, I spent a bunch of time playing around with Network Attached Storage.  I ended up doing a lot of searching on the topic, in conjunction with Time Machine.  That will be the topic of another blog post, I can tell you.  During that session, which was primarily on blogs and forums, and a few product pages, Safaris Real Memory usage went to 1.6G, and the Virtual Memory to 1.7G.  When I closed down all those pages and went back to my one gmail page, Real Memory drifted back down to 1G.  I left it overnight, and it was still at 1G.  I’m sorry, it’s just a hog.  I’ll be fair:  I haven’t done the same experiment on FireFox, or Chrome.  I should, and I probably will.  Right now, I feel like I’ve got back 600MB of Real Memory by nuking the Flash hoarding, and I’ve got things to do, so I’m not going to dig into FireFox or Chrome just yet.  If anyone has numbers on those, I’d love to hear them.

Last item - I didn’t say anything about JavaScript Blacklist.  It’s probably not relevant to the memory thing, but it fits into the ancillary category of ’stomping on things that irritate me’.  JavaScript Blacklist stops things like IntelliTXT.  I’ve nothing at all good to say about IntelliTXT, or any of the other flyover-get-in-your-face-parasites.  In the past, if you had a page that used these, I would never return to it.  Now, I’ll never know.  One aside, though: JavaScript Blacklist blocks’s scripts.  Up until a short time ago, I’d never heard of  I did some searching (oink oink), and saw people really ranting about it.  Basically, a site that uses scripts intercepts copy operations, and if you copy text off the site, it inserts an attribution URL.  Places like major newspapers use this.  This actually seems reasonable to me.  I mean, if you are copying the content from a site like that, well, you should be making an attribution, at the least.  If you don’t like the link, you can strip it out when you paste.  Seems to me that people are a little too free with copying content that’s actually not theirs to copy.  It’s called plagiarism, or copyright infringement.

Here are links to the three nifty little products that I’ve now got for Safari:

Posted by Eli Boling on February 24th, 2011 under OS X, Uncategorized | 3 Comments »

OS X nslookup

Yesterday I finally switched from a Cisco add-on VPN to the built-in VPN support in Snow Leopard.  The built in stuff is much nicer.  Easy to use, performant.  I had, however, a little hitch along the way.

I have a Windows VM running on the Mac on which I do all my dev builds.  My dev machine is a Macbook Pro. I run VMWare Fusion, in unity mode, and life is generally good; it’s a very nice cross targeting solution.  The hitch was that when I switched VPNs, DNS no longer worked from the Windows VM for names inside the VPN target domain(s).  Lee is my go-to guy for problems like this, and he and I spent quite a while poking around at it.  We learned interesting things about the differences between how the two different VPN clients effected their support.  They were quite different in how they altered the lookup flow.  I won’t go into it, because it’s irrelevant to the posting, really.  In the end it turned out that my DNS problems were due to being on an older version of VMWare (version 2).  I’d been meaning to upgrade for a long time, but held off, because I never like to upgrade when I’m in the middle of things, lest my world come unglued. Unfortunately, I’m always in the middle of things.  Anyway, I upgraded, and things were good again.

Now, about nslookup, which is in the title of this post, in case you hadn’t noticed.  Well, Lee and I use nslookup pretty commonly on various platforms to make sure that things are resolving the way they should.  We spent quite a bit of time using nslookup yesterday, and we kept being really confused by the results.  Eventually, Lee strolled across the street to the coffee shop (I prefer Peet’s coffee, and this wasn’t Peet’s) to poke around from an external network to more closely match my setup (I’m in Massachusetts, he’s in California).  Around the same time, we both started being really suspicious of the results from nslookup.  And the punchline of the whole thing comes from the man entry for nslookup on OS X:

       The nslookup command does not use the host name and address resolution
       or the DNS query routing mechanisms used by other processes running on
       Mac OS X.  The results of name or address queries printed by nslookup
       may differ from those found by other processes that use the Mac OS X
       native name and address resolution mechanisms.  The results of DNS
       queries may also differ from queries that use the Mac OS X DNS routing

When they say the results may be different from other processes, we’re talking things like ‘ping’.  Sigh.

Mind yourselves out there, there’s gears and stuff you can catch your clothes on.

Posted by Eli Boling on February 11th, 2011 under OS X | 1 Comment »


So, I’m sitting in the Denver airport, on my way to Massachusetts, on a longer layover.  I went to some crappy sports bar, and they had some very unpleasant food, and so I found a cantina upstairs that I thought might be able to wash away the sports bar.  Corn Poblano soup, and a reasonable local wheat beer, and I was feeling better.  Reading my book at the bar, as the soup arrives, and I pull the soup in, so I don’t dribble, and the book goes in front, and I lift it up a bit, so I can see, and the bar tender (younger guy) promptly says "So, what is the trouble with physics?"  The book I’m reading just now is "The Trouble With Physics", by Lee Smolin.

Mind you I don’t always read stuff like this.  My reading range is pretty broad, and included in there are murder mysteries, and Terry Pratchett books, and other non-fiction, and just kind of all over the place. So I’m not always carrying around a book on theoretical physics.

Now, that question he asked is potentially kind of a stock question, so I shrugged, and said I wasn’t so sure yet, and he asked if I hadn’t got far enough in the book to know yet (halfway), and I opened up a bit more, and told him that basically the theme was that the author felt that nothing had really been figured out in a while.  Just so y’all know, the book, at least so far, goes through some of the history of physics, on through quantum mechanics, and then into post-modern physics (if you will).  This includes string theory, and super string theory, and could possibly be about to get into philosophy.  Not sure yet.  The starting theme of the book is that we’ve not made revolutionary discoveries in the last generation or so, and it’s frustrating a lot of people, including the author, and he’d like to tell people about everything that’s happened lately.

Getting back to the bartender, who, I will swear on whatever icon you require, responds to my extremely brief summary with "You mean, string theory doesn’t work, or M-theory or whatever they call it now-a-days?"  Now, I had not heard about M-theory until an hour or so before because that’s what the book had got up to an hour before.  Blink…blink.  "Yes," I said, "you are exactly right."  From there, we got into a discussion about the LHC (Large Hadron Collider), and the Higgs boson, and the standard model, and it should be an interesting couple of years, because everyone is waiting for the Higgs boson.  I said if they did find the Higgs, they were still stuck, because of all the existing questions about string theory, and he said, yes, but if they didn’t find the Higgs, then that kind of tosses the entire standard model.  He asked for the publishing date and the author, because he said that all the stuff he’s reading now is saying pretty much the same stuff over and over again.

Yes, I know, operating under the presumption that the barkeep doesn’t know theoretical physics is a sort of prejudice, and I do feel a little bad about that.  I have to say that I was really pleased to be having the conversation with him.  The only other fellow who asked me about the book was a professor of physics on the plane out, and when I tried to engage him in conversation, I got a lot less out of it than I got out of the bartender.

Anyway, I had to share.

Posted by Eli Boling on October 23rd, 2010 under Uncategorized | 2 Comments »

Mac Gripe: the CD/DVD drive

I love my Mac.  I hate the CD drive in my Mac.  I’ve hated all the CD drives in the Macs for a while.  What I can’t stand about them is the lack of a physical manual eject capability.  Others have had this complaint.  I don’t know why Apple did this thing, but they did it a while ago, and they continue with it.  It’s the conversion of the Mac to a little CD/DVD ATM machine.  You put the thing in, and if it isn’t quite the right thing, you really can’t be sure you are going to get it back, and once it’s in there, the drive is useless to you.

I have two scenarios that I’ve run into that really ticked me off.  Most recently was about an hour ago, when I was trying to dredge up some old data and I found a CD that might have the archived bits I wanted on it.  So I stuck the thing in the drive to see what was on it, and it spun up and spun down and *gulp* that was it.  Now, I admit I did something dumb here:  I tried to cajole the drive.  I had a Windows VM running, so I told VMWare to mount the physical CD drive.  So that took out VMWare, and the VM was shutdown unfavorably, so I lost a bunch of state that’s a pain to put back together.  Like I said, that was probably dumb.  So I end up using the most common fix for this, which is to power down my Mac, and power it back up with the mouse button held down (no drutil didn’t work - nothing worked).  In my opinion, that’s a piss poor thing to make a user do to eject a CD that the drivers get their various bits in a bunch about.  Drill me a hole for that old fashioned paperclip, please.  FYI, once I got that CD out, I took it down to my Dell laptop, and stuck it in there, and it read the bits just fine, so the CD isn’t some piece of radioactive toxic waste, that’s for sure.

So, the second case, which goes a little further back in time to late last year, had to do with my Mac Mini.  This was a personal machine that my family used.  It died a hard death (logic board failure).  But there was a CD in it, and I wanted it back.  With the logic board gone, it wouldn’t boot to any point at all, even off an external device where it could be forced to eject the CD.  So how do you suppose you have to go about getting that CD back?  You have to dismantle the Mac mini, INCLUDING THE CD DRIVE.  You actually have to physically unscrew a ton of screws on the drive to get the cover off!  Getting to those screws is no picnic, either, because the CD drive is mounted on the top of the machine, and the cover comes off the bottom, so you have to dismantle the bulk of the machine before you get to the drive.  I asked the guys at the Genius Bar about this, and they said that was basically the only way, too.

I’m sure Apple had a really good reason for doing this, but in my opinion, IT WASN"T GOOD ENOUGH!

Posted by Eli Boling on February 19th, 2010 under OS X, Uncategorized | 2 Comments »

Dynamic Symbol Binding: Origins and Effects


Dynamic linking has been available on most operating systems for a long long time now.  It is interesting, however, to peer into the origins and resulting behavior of some aspects of symbol binding on various systems.  You might be surprised by the results.

To focus this discussion, what I’m talking about here is basic support for linking to symbols in shared libraries.  These are DLLs on Windows, shared objects (.so) on Linux, and dynamic libraries (.dylib) on OS X.  There are other platforms of course, but these are the ones that may affect our customers in the near term, so that’s what we’ll talk about today.

I’m not able to speak authoritatively on some of this topic.  On the raw technical details, I can, because I’ve been down among the bits, but on some of the points of motivation and history I will be relating things that have been told to me by others in the now distant past.  Some of these bits of information aren’t written down anyplace that I know of.  Feel free to add to the pile of lore if you have personal knowledge.

The Primordial Ooze

Not long after the surface of the datascape cooled, and bubbling pools of ones and zeroes congealed into a solid crust, the language C appeared and populated the world at a ferocious pace.  Linkers and librarians developed to support the ecosphere, and various traits and behaviors became standard.  Some of this stuff survives to modern times, much like the way the pelican still plies our skies.  [Off topic: once I was on the wharf in Santa Cruz, and a pelican walked up to me and yawned.  That was impressive.  You talk about birds having wing-span - those things have beak-span!]

The trait of interest in linkers for today’s article is order of symbol binding.  Let’s set aside dynamic linking for a moment.  When linking a static image, using classical C as an example, the user would specify a bunch of object files and some libraries, left to right:

a.o b.o c.o lib1.a lib2.a lib3.a

The linker would go through these, finding public symbols to satisfy external references and hooking things all together.  Now, for any symbol ‘foo’, any of those objects or libraries could provide the public definition, and it is OK for multiple libraries to provide public definitions.  The basic rule that the linker followed was that the first public symbol found in a left to right search is selected to satisfy external references.  So, if lib1 and lib3 both provide public definitions of ‘foo’, and a.o refers to ‘foo’, then the one the linker will select is the one from lib1, because it is leftmost.  Now, note that if lib1 and lib3 both define ‘foo’, and lib3 refers to ‘foo’, then the linker will still select the definition from lib1.  This has allowed developers to override definitions of symbols by inserting a library with different definitions to the left of the original library in the link line.  This behavior is very much driven as an aspect of the tools (specifically the static linker).  It’s not really language specific, but it can end up freighting the tools for any language, oddly enough.

Windows:  DLLs Are New!

Along comes Windows.  Dynamic Link Libraries are a new concept, and enable developers to share code better.  Actually the concept isn’t new - unix beat them to it by a long shot, but you can run rough-shod over history with good marketing.  Now, the first tools for Windows were C based, and the classic left to right rules for the static linker still held true.  However, DLLs don’t follow quite the same rules.  Once you build a DLL, the linkages to that DLL, and within that DLL are much more specific.  If the DLL wants to access ‘malloc’, it doesn’t just bind to any ‘malloc’, it binds to a local copy of ‘malloc’, if it binds the C RTL statically, or to a symbol called ‘malloc’ from a particular DLL.

Linux:  Fanatics Unite!

Along comes Linux.  Now these folks take things to extremes, I think.  When they did shared library support, their attitude was that linking a shared library should be identical in behavior to linking a static library.  So this means that if you link a shared library together, and that shared library binds to a symbol called ‘malloc’, you don’t know what actual public definition that this will bind to until runtime.  You don’t even know which library that symbol might come from.  That’s just the way it works on most platforms with classic C style compile/link phases with static linking.  Now this is an interesting philosophy, and it’s got some interesting features that it supports, but it implies a lot for the tools that have to deal with the platform.  It means, among other things, that nth party languages and tools have to be aware of this artifact of left to right ordering of symbol binding that comes from way back in some fragment of the pelican genome.

OS X:  Umm, it’s like Windows

I have no idea what combination of history and philosophy went into the dynamic binding logic on the Mac, but for the specific attribute this article is interested in, the behavior is more like Windows.  Originally, the behavior was a mix of linux and Windows, because when you bound to some external like ‘malloc’, you didn’t know what shared library was going to provide it.  Apple added support for being more specific about it in a point release, so that a shared library could say "I want ‘malloc’, but I want it from ‘libSystem.dylib.’"

And To Illustrate…

So, now let’s take a look at a very simple C example that illustrates what I’m talking about.

Three files, three images.  One is the executable, and two and three are shared libraries.  Here we go:


#include <stdio.h>
extern void two(void);
extern void three(void);
int main(void) {
  printf("hello world\n");


#include <stdio.h>
void two() {
  printf("two.c: two\n");


#include <stdio.h>
void two() {
  printf("three.c: two\n");

void three(void) {
  printf("three.c: three\n");

Now, let’s build these things on linux:

gcc -shared -o two.c

gcc -shared -o three.c

and we’ll build two different versions of the main executable:

gcc -o one_a one.c

gcc -o one_b one.c

And it’s the second one of those executables that’s really the interesting one.  It’s in that example that the differences between linux and Windows and OS X finally stand up and say hello.  In three.c, the function ‘three’ makes a reference to ‘two’, which happens to be publicly provided by three.c.  Thus a version of ‘two’ gets built into, and is made publicly available.  Now, in the first executable, that’s the only version of ‘two’ around, and the result is predictable.  In the second case, however, there is a copy of ‘two’ that is provided by  And the way the linux tooling works, because the executable binds before it binds (left to right), the reference to ‘two’ inside will be bound to the implementation in for the second test image (one_b).

So here’s the output for these two images on linux:


hello world
three.c: two
three.c: three
three.c: two


hello world
two.c: two
three.c: three
two.c: two

Notice that last line - that’s the call from three.c to ‘two’.  Now it goes out of the previously linked shared library and into  Both the call from one.c and the call from three.c go to the same place in two.c, even though we had three separate static linker invocations.

Now, if you link the same set of programs using gcc on OS X (slightly different command lines), you get the following runtime results.


hello world
three.c: two
three.c: three
three.c: two


hello world
two.c: two
three.c: three
three.c: two

Now, look at that - the call from one.c to ‘two’ went to libtwo.dylib.  The call from three.c to ‘two’, however, went to the statically bound copy in libthree.dylib.  That’s what would happen on Windows, too.

So What To Do?

OK, so which of these models should our tools support?  The answer basically comes down to this:  When in Rome, do as the Romans do.  That means on Linux, our tools ought to support the linkage model provided by the default tools, except where it would completely hose our language(s) (and that’s what we did in Kylix).  On OS X, we’ll do what OS X defaults to.  Now, I’ll tell you, that cute little dynamic override feature that linux supports is supported through standards wrapped around the ELF image format.  OS X uses Mach-o, and there appears to be no standard to support the sort of override that linux does.  The linux support is a major PITA to implement, too, so OS X actually makes me breathe more of a sigh of relief, because the shared library model there, so far, appears to be much less complex than the model on linux.


This article was a bit long to get to a couple of basic points.  The first point is that basic dynamic library support on OS X looks more and more like it’s simpler from a tooling standpoint than on linux.  Mind, now, I’m not discussing frameworks and umbrellas here.  The second point is that it’s interesting how very old paradigms in the way that a set of command line tools operate to this day bleed through and can impact language tooling that might want to operate in a completely different fashion.  The prime example of this for us is Delphi.  Our command line tooling for Object Pascal simply doesn’t have the concept of left to right ordering of static link dependencies for libraries, and yet we have this interesting aspect on linux that we have had to consider - an aspect that does not translate to OS X or Windows.

Posted by Eli Boling on February 16th, 2010 under C_Builder, Delphi | 1 Comment »

Mac OS X shared library initialization

I was part of the team that was passed through the Kylix threshing machine originally, so I decided to do a little research into shared library initialization and termination early on.  Some very difficult to debug things can happen to you if you get surprised by library load/unload sequencing on a platform.

When we did the original linux work, we started out with the assumption that the loader worked the same as Windows with respect to initialization order.  We assumed dependencies were initialized first, in a depth first ordering.  I don’t remember anymore what the default ordering was, but it wasn’t right.  We found out that the tools are responsible for supplying the shared object initialization order in a special section in the executable.  I didn’t want to have a similar experience late in the development cycle for the Mac.

The first thing to do was to find out how shared objects (e.g. .dylib files) specify initialization procedures.  If you look at the MACH-o spec, or in mach-o/loader.h, you’ll find the LC_ROUTINES load command, which seems just right for the job.  Comments in the header say this is used for C++ static constructors.  Excellent!  A little further reading shows the ld option -init allows you to specify the startup routine manually, and directs you to the man page for ld.  OK, off we go, and there we see only one sentence that includes a clause stating that this is used rarely.  Red flag.  And there the documentation trail dies.  Red flag.  OK, let’s go look at a C++ static constructor example built with g++.  Hmm, no LC_ROUTINES.  Red flag.  OK, let’s look for LC_ROUTINES in some of the .dylibs in the shipping OS.  Hmm, no LC_ROUTINES.  Red flag.  So, how do C++ static constructors get called?  Looking around a bit (see otool -lv), I see sections like __mod_init_func, with the S_MOD_INIT_FUNC_POINTERS flag set.  Good luck finding documentation on the S_MOD_INIT_FUNC_POINTERS flag (yeah, ok, red flag).

So, here’s where we are saved a ton of time from the fact that the OS is open source.  I have the DYLD source handy, and a little grepping and reading gives a pretty clear picture of how things really work. LC_ROUTINES is only suitable for calling an initialization routine.  LC_ROUTINES does not support a termination notification.  So now LC_ROUTINES is a porcupine of red flags, and I wouldn’t touch it with a ten foot pole.  S_MOD_INIT_FUNC_POINTERS and S_MOD_TERM_FUNC_POINTERS are the way to go.  These indicate sections that are arrays of pointers to functions that the OS calls for both the executable and for shared objects, if it finds them.  Each function in the S_MOD_INIT_FUNC_POINTERS section is called with a mess of parameters, which I’ll describe later.  Each function in the S_MOD_TERM_FUNC_POINTERS section is called with no parameters at all.  The functions in the S_MOD_TERM_FUNC_POINTERS array are called in reverse order.  So for init, it’s init[0], init[1], …, init[n], and for term it’s term[n], term[n-1], …, term[0].  Without source, this would have been a long slow learning experience.

The loader source code also indicates a recursive descent initialization of the shared objects, but the devil is in the details, and rather than try to fully grok the source code, and possibly miss something critical, I decided to switch back to the empirical, and write a bunch of tests.  Armed with the details of how the loader wants us to set up the init and term functions, I descended on our C++ linker, and brought it to the point where it could build a Mac .dylib (mostly).  Many of the results are mundane, but a few are sort of interesting, and I’ll try to present them here compactly.

First some notation.  In all the examples below, X refers to the executable.  Other letters refer to shared objects.  Solid directed lines indicate a static dependency between objects.  For example, X -> A, means the executable depends on the shared library A, and we’d expect A to be initialized before X runs.  Also we’d expect A to be terminated after X runs.  Dotted lines in the examples mean we dynamically load a shared object with dlopen.  That’s where a lot of the really horrible things happened in Kylix, because of all the package load/unload operations, so I invested some tests there.  To show the actual initialization results, I’m going to use a very simple notation.  I use the letter of the object for initialization, and the same letter, preceded by ~ to indicate termination.  So for the example X -> A, I expect the following sequence: A X ~X ~A.

So, simple tests first.

X -> A

result: A X ~X ~A

  / ^
X   |
  \ |

result: A B X ~X ~B ~A

And a diamond:

       /   \
X -> D       A
       \   /

result: A B C D X ~X ~D ~C ~B ~A

OK, moving on to a complicated static linking example.  This one is a ladder, and here we should do a little explanation.

    D -> C -> B -> A
  / |    ^    |    ^
X   |    |    |    |
  \ v    |    v    |
    H -> G -> F -> E

In the example above, we have two straight legs of dependencies, with cross dependencies running back and forth between the legs.  You can’t do a naive recursive depth first initialization of this, because you could end up running some initializers before their dependents are initialized.  You have to do a topological ordering of the dependencies, and initialize in that order.  That can be done with a recursive operation, and it is done that way by the Mac loader.  The easiest way to observe it is to follow all the dependency sequences in the ladder above.  Here they are:


The last one is the longest chain.  In all the chains, D is at the top.  The longest one is the initialization chain we need, since it guarantees all dependents will be initialized first.  It is roughly what we expect out of the loader, barring ties in initialization order due to libraries at the same level in the dependency graph.

Here’s what we got when we ran the example:  A E F B C G H D X ~X ~D ~H ~G ~C ~B ~F ~E ~A

OK, so things are looking up.  I am getting what I expect out of the loader, and I’m not terribly surprised, because otherwise the world would have fallen apart long ago for the Mac, right?  That’s what we thought on Kylix, and it’s why I looked so closely here.  Anyway, now on to the dynamic loading cases.  This is where we mix static loading with dynamic loading, and make sure the semantics are reasonable, and don’t require us to jump through too many hoops in the linker and RTL.

X -> A
B -> C

In this example, X loads B with dlopen, and then unloads it with dlclose before exiting.  B depends on C, but not A.  Here’s what we got: A X [load B] C B [unload B] ~B ~C ~X ~A.

Now, just for fun, let’s try the same case, but let’s not call dlclose in X.  Meaning we orphan the B shared library.  Here’s where it gets harder to guess what would happen.  Here’s what we got:  A X [load B] C B ~B ~C ~X ~A.  Yep, same as the case where we called dlclose!  I certainly wouldn’t rely on this feature, but it’s nice to know about.

Now, a more complicated case where we try out load and unload sequences.

X --> A
.     ^
.     |
. . . B
.     ^
.     |
C ----

In the case above, there’s a special runtime sequence.  X loads B, then loads C, then unloads B and unloads C.  The interleaved order is purposeful - to be mean.  There are no cyclical dependencies involved, but there are reference count issues.  C depends on B, so when we manually unload B in X, we hope the OS doesn’t call the terminate routine in B, because C still needs B.  And here’s what we got:  A X [load B] B [load C] C [unload B] [unload C] ~C ~B ~X ~A.  That’s what we wanted.

And for our last example, we’ll include a cyclical case.  This is a case where the user manually, via dlopen, causes a cyclical dependency to appear in the shared object dependencies.  This is bad, and there is no good way for the OS to resolve it.  Still, we’d like to know what happens.  For example, one outcome could have been for the OS to panic and kill the task.  The way we create the cycle is to have a shared object use dlopen in its initialization routine to load another shared object that has a static dependency on something further up the initialization chain that hasn’t been initialized yet.

X -> D -> C -> B -> A
     ^         .
     |         .
     |         .
      -------- E

And what we got was this:  A B E C D X ~X ~D ~C ~E ~B ~A

So what happened was that the OS stopped the initialization of the library loaded with dlopen at the point where it hit pending dependencies in an existing initialization chain.  That meant the loaded library E had its initializer run with some dependencies uninitialized (C and D).  This protected the main executable’s dependencies, sort of, but either way, the user is probably hosed.  As a basic rule of thumb, people should avoid calling dlopen in a library initializer anyway unless they really know what they are doing.

So there’s a basic way of summarizing the results here:

  • Each shared library has its init/term routine(s) called once and only once per process.
  • The list of shared libraries for a task is initialized in one order, and terminated in the reverse order.
  • All dependencies are initialized before their dependents.

This is a good result.  It’s similar to Windows, and it appears to meet our basic requirements for package initialization code in the RTL without us having to add more magic to the tools over and above what we’ve done in the past to preserve initialization order.

Once last thing.  I promised to give list the parameters to the initialization functions in the S_MOD_INIT_FUNC_POINTERS sections.

void func(int argc, const char **argv, const char **envp, const char **apple, struct ProgramVars *pvars);

And ProgramVars is this:

struct ProgramVars {
  const void*	mh;
  int*		NXArgcPtr;
  const char***	NXArgvPtr;
  const char***	environPtr;
  const char**	__prognamePtr;

The interesting item here, really, is the mh field. That one is actually the Mach header for the image, which is really nice, because it means on OS X, a shared library has a reliable way of finding useful things about itself, like the precise start of sections in memory.  I don’t know why there is the redundancy of information in the parameters passed to the init function, and the fields of the struct of the last parameter.

Ideally, none of you out there will never have to worry about this at all, but if you do, there it is.

Posted by Eli Boling on January 29th, 2010 under C_Builder, Delphi, OS X | 4 Comments »

Mac OS X Exception Handling

Purists and pragmatists will argue over the feature of converting hardware level exceptions into RTL exceptions.  This is the feature where you catch, for example, a memory access violation and keep running your application. Purists will say you should never do that, because you could wind up doing serious damage down the line to users’ data, if you don’t know what caused the bad pointer in the first place.  Pragmatists will observe that some very large percentage of these come from things like null pointers resulting from some missing fence test, often triggered by some odd combination of user input, and why would you force the user to lose data by letting the app crash?  I’m with the pragmatists here, although there are times when I wish I weren’t.  Such as when I have to implement the support in the RTL on our various platforms.  I’m going to think about a new metric for measuring the effort involved, based on a unit of length of fingernails chewed over the implementation period, which I think better represents the overall effort than a pure time measure.

On Win32, OS/hardware exceptions and language exceptions are all dispatched through the same mechanism: stack based exception registration records.  Handling these and even resuming them is pretty straightforward.

On Linux, OS/hardware exceptions are quite a bit trickier to deal with.  Exception handling is typically done using a PC mapped scheme that is vendor dependent.  There is no standard for language based exceptions, and all the hardware exceptions are dispatched to signal handlers, which you can install with the POSIX signal handling APIs.  There are also some strict protocols that you need to follow with respect to handling Ctrl-C and friends if you want to play by the rules for shell scripts.  The POSIX signal handling APIs provide portable interfaces for getting the RTL in the game, but the devil is in the details.  For example, when your signal handlers are invoked, you get a signal context sent to you that represents the machine state (processor registers, etc).  That’s an opaque pointer, because POSIX really can only go so far as to define/restrict what you can and can’t do with the machine state for hardware exceptions.  Thus once you land in your handler, everything becomes platform specific with respect to that context, both in its format, and what you can do with it.

In porting to OS X, I’ve tried to use POSIX as the common point between the Linux and OS X support as much as possible.  So I did the same thing with exception handling, reviving the Kylix PC mapped exception handling support.  The first thing that I did was to try to discover what I could do with the machine context that was delivered to my signal handlers.

Here’s the POSIX spec for your key signal handling setup API:  This API has been updated a bit with respect to its formal documentation in the past several years, but the fundamentals of it haven’t changed since I did the exception handling work in Kylix.

I started on OS X writing a simple C program so that I could investigate the behavioral characteristics of the machine context that is delivered to a signal handler.  Specifically, I wanted to make sure that I could resume execution on the client thread/stack in the face of memory access violations (e.g. SIGSEGV), and keep the thread sane while delivering the critical data to the RTL for stack unwinding purposes.  My simple test case ran fine, so I moved to the next step, which was to debug the signal handler, and confirm the validity and utility of the machine context.  There was an immediate hull breach.  I don’t remember all the terrible things that happened, but the nut of it is this:  your signal handler will never be invoked if you run the app under GDB.  To understand the reasons for this, you have to recall the fundamental structure of OS X.

OS X is built on top of the BSD kernel, using the Mach kernel originally developed by CMU.  The various POSIX APIs are layered on top of a Mach layer, which is the real process control layer that matters to low level implementors and tools, such as GDB.  GDB uses the Mach exception handling APIs for injecting exception handlers into the process to watch for faults in the app, and the support in GDB doesn’t allow for chaining the exceptions on to signal handlers.  Here’s a link to one discussion on the matter:  Scroll down to the bottom, and you will see words like ‘forever’, and ‘long standing’ with respect to describing the behavior in GDB.

So that pretty much crosses off using signal handlers for hardware exceptions on OS X - I can’t very well write off GDB.  I had looked around a bit more before and after I ran across that post, and found that folks who did VMs (Java, LISP) ran into the same issues.  All the posts said pretty much the same thing:  you have to use the Mach exception handling APIs if you want reliable support.

The Mach exception APIs work very differently for processing events.  The basic model is that you allocate a port over which you will receive messages describing exceptions.  The port can be configured to receive various types of exceptions (e.g. memory access, invalid instruction, and math error).  Then you spin up a thread on which you wait for messages on that port.  If another thread in your application faults, then the OS will stop that thread and dispatch a message to your exception port.  The exception handling thread then processes the message.  In our case, that means decoding the exception information, pre-chewing it a bit, and then fiddling with the faulting thread’s context to set the thread to continue in an RTL handler with that pre-chewed information sitting on the stack.  Then we tell the OS that we’ve processed the exception message successfully, and we go back to looping looking for exception messages.  The OS then restarts the faulting thread, where we land in our language specific RTL code to create the RTL exception object and unwind the stack.

Most of this is all under the covers, so to speak, for the application developer, but there are some points that are of interest:

  1. If you include SysUtils, because of the need for a thread to watch for exceptions, you will always get a second thread allocated for the application.
  2. We are not going to support recovery from stack faults.  I think we had that requirement in Kylix, too, but we’ve made the decision for real here, and I’m going to phase out any Linux support for it as well at this point.  That requirement is not really specific to Mach issues, but represents us giving up on the feature because of the various caveats that we’ve had to place on it from one platform to the next, making it too difficult to support in a uniform and useful fashion.
  3. Ctrl-C events are not dispatched via Mach exception messages; these are still caught as SIGINT, via POSIX signal handlers.  This is actually good, because it means we won’t have to go through too much agony ensuring proper shell script handling semantics.  So we’ll have both the Mach exception thread, and some signal handlers in play.

If I go into the gory details of the Mach implementation, this post will get too long, so I’m just going to include some links here to some references that I’ve found useful:

I strongly recommend that last article.

There are a couple of other miscellaneous things of interest to note about the Mach exception support. First, the books/articles/examples variously reference a function called exc_server, and a header file called exc_server.h.  That header file doesn’t exist on the Mac, but the function is there, and needed.  Second, there is an oddity with exc_server:  when you call exc_server, it calls back into your application.  However, you don’t tell exc_server the callback function.  That function has to have a particular name, and is looked up out of your symbol table at runtime.  Get the name wrong, and the app goes down in flames on the first exception.

Posted by Eli Boling on November 10th, 2009 under C_Builder, Delphi, OS X | 4 Comments »

Server Response from: BLOGS1