Watch, Follow, &
Connect with Us

The Oracle at Delphi













Older Stuff



A Happy Accident and a Silly Accident

By now you’re all aware that we’re getting ready to move to a new building here in Scotts Valley. This process is giving us a chance to clean out our offices and during all these archeological expeditions, some lost artifacts are being (re)discovered. Note the following:

SNC00041 SNC00042

These are some bookends that my father made for me within the first year after moving my family to California to work on the Turbo Pascal team. He made these at least two years before Delphi was released, and at a few 6 months before we even began work on it in earnest. Certainly before the codename “Delphi” was ever thought of. I suppose they are my “happy” accident.

This next one is just sad. I received this award at the 2004 Borcon in San Jose from, then Borland President/CEO, Dale Fuller. My title at that time was “Principal Architect”… Of course I like to think that I have strong principles, and maybe that was what they were trying to say… Within a week or so after I got this plaque, another one arrived with the correct spelling of my title. I keep this one just for the sheer hilarity of it. Also, it is a big chunk of heavy marble, so maybe one day I can use to to create a small marble topped table…

SNC00027

Posted by Allen Bauer on February 23rd, 2010 under CodeGear, General, Random, Work | 3 Comments »


What. The. Heck.

Is. This? I simply cannot explain this. At. All.

SNC00009

This was on a bulletin/white-board in the break area. I’d never noticed it because it was covered with photos from various sign-off (final authorization to release the product) celebrations. Lots of photos of both past and present co-workers, many thinner and with more hair ;-). Since we’re in the process of cleaning up in the preparation for moving to our new digs, it is interesting what you find… I presume this image has been on this whiteboard since… I guess… Delphi 5 or is that Delphi S? Either someone has a very odd sense of humor… or, more likely, beer had been involved during one of those sign-off celebrations from the photos. Then again, maybe this whiteboard had been in the Borland board room and this was from a corporate strategy meeting… nah, gotta be the beer.

Ow, my head hurts now…

Posted by Allen Bauer on February 19th, 2010 under CodeGear, Delphi, General, Random, Work | 10 Comments »


A case when FreeAndNil is your enemy

It seems that my previous post about FreeAndNil sparked a little controversy. Some of you jumped right on board and flat agreed with my assertion. Others took a very defensive approach. Still others, kept an “arms-length” view. Actually, the whole discussion in the comments was very enjoyable to read. There were some very excellent cases on both sides. Whether or not you agreed with my assertion, it was very clear that an example of why I felt the need to make that post was in order.

I wanted to include an example in my first draft of the original post, but I felt that it would come across as too contrived. This time, instead of including some contrived hunk of code that only serves to cloud the issue at hand, I’m going to try a narrative approach and let the reader decide if this is something they need to consider. I may fall flat on my face with this, but I want to try and be as descriptive as I can without the code itself getting in the way. It’s an experiment. Since many of my readers are, presumably, Delphi or C++Builder developers and have some working knowledge of the VCL framework, I will try and present some of the problems and potential solutions in terms of the services that VCL provides.

To start off, the most common case I’ve seen where FreeAndNil can lead to strange behaviors or even memory leaks is when you have a component with a object reference field that is allocated “lazily.” What I mean is that you decide you don’t need burn the memory this object takes up all the time so you leave the field nil and don’t create the instance in the constructor. You rely on the fact that it is nil to know that you need to allocate it. This may seem like the perfect case where you should use FreeAndNil! That is in-fact the very problem. There are cases where you should FreeAndNil in this scenario. The scenario I’m about to describe is not such a case.

If you recall from the previous post, I was specifically referring to using FreeAndNil in the destructor. This is where a very careful dance has to happen. A common scenario in VCL code is to hold references to other component from a given component. Because you are holding a reference there is a built-in mechanism that allows you coordinate the interactions between the components by knowing when a given component is being destroyed. There is the Notification virtual method you can override to know if the component being destroyed is the one to which you have a reference. The general pattern here is to simply nil out your reference.

The problem comes in when you decide that you need to grab some more information out of the object while it is in the throes of destruction. This is where things get dangerous. Just the act of referencing the instance can have dire consequences. Where this can actually cause a memory leak was if the field, property, or method accessed caused the object to lazily allocate that instance I just talked about above. What if the code to destroy that instance was already run in the destructor by the time the Notification method was called? Now you’ve just allocated an instance which has no way to be freed. It’s a leak. It’s also a case where a nil field will never actually cause a crash because you were sooo careful to check for nil and allocate the field if needed. You’ve traded a crash for a memory leak. I’ll let you decide whether or not that is right for your case. My opinion is that leak or crash, it is simply not good design to access an instance that is in the process of being destroyed.

“Oh, I never do that!” That’s probably true, however what about the user’s of your component? Do they understand the internal workings of your component and know that accessing the instance while it is in the throes of destruction is bad? What if it “worked” in v1 of your component and v2 changed some of the internals? Do they even know that the the instance is being destroyed? Luckily, VCL has provided a solution to this by way of the ComponentState. Before the destructor is called that starts the whole destruction process, the virtual method, BeforeDestruction is called which sets the csDestroying flag. This can now be used as a cue for any given component instance whether or not it is being destroyed.

While my post indicting FreeAndNil as not being your friend may have come across as a blanket statement decrying its wanton use, I was clearly not articulating as well as I should that blindly using FreeAndNil without understanding the consequences of its effect on the system as a whole, is likely to bite you. My above example is but one case where you should be very careful about accessing an object in the process of destruction. My point was that using FreeAndNil can sometimes appear to solve the actual problem, when in fact if has merely traded it for another, more insidious, hard to find problem. A problem that doesn’t bite immediately.

Posted by Allen Bauer on February 16th, 2010 under CodeGear, Delphi, General, Work, ednfront | 32 Comments »


Oh, the things you find…

When you’re cleaning out your office.

SNC00002SNC00008

Posted by Allen Bauer on February 12th, 2010 under Delphi, General, Work | 4 Comments »


A case against FreeAndNil

I really like the whole idea behind Stackoverflow. I regularly read and contribute where I can. However, I’ve seen a somewhat disturbing trend among a lot of the answers for Delphi related questions. Many questions ask (to the effect) “why does this destructor crash when I call it?” Invariably, someone would post an answer with the seemingly magical incantation of “You should use FreeAndNil to destroy all your embedded objects.” Then the one asking the question chooses that answer as the accepted one and posts a comment thanking them for their incredible insight.

The problem with that is that many seem to use FreeAndNil as some magic bullet that will slay that mysterious crash dragon. If using FreeAndNil() in the destructor seems to solve a crash or other memory corruption problems, then you should be digging deeper into the real cause. When I see this, the first question I ask is, why is the instance field being accessed after that instance was destroyed? That typically points to a design problem.

FreeAndNil itself isn’t the culprit here. There are plenty of cases where the use of FreeAndNil is appropriate. Mainly for those cases where one object uses internal objects, ephemerally. One common scenario is where you have a TWinControl component that wraps some external Windows control. Many times some control features can only be enabled/disabled by setting style bits during the creation of the handle. To change a feature like this, you have to destroy and recreate the handle. There may be some information that is stored down on the Windows control side which needs to be preserved. So you grab that information out of the handle prior to destroying and park that data in an object instance field. When the handle is then created again, the object can look at that field and if it is non-nil, it knows there was some cached or pre-loaded data available. This data is then read and pushed back out to the handle. Finally the instance can then be freed by FreeAndNil(). This way, when the destructor for the control runs you can simply use the normal “FCachedData.Free;” pattern since Free implies a nil check.

Of course there is no hard-and-fast rule that says you should not use FreeAndNil() in a destructor, but that little “fix” could be pointing out that some redesigning and refactoring may be in order.

Posted by Allen Bauer on February 5th, 2010 under Delphi, General, Work | 32 Comments »


There may be a silver lining after all

After having to deal with all the stack alignment issues surrounding our move to target the Mac OS, I’d started to fear that I would get more and more jaded cynical about the idiosyncrasies of  this new (to many of us) OS. I was pleased to hear from Eli that he’d found something that, at least to a compiler/RTL/system level software type person, renews my faith that someone at Apple on the OS team seems to be “on the ball.”

Apparently, the Mac OS handles dynamic libraries in a very sane and reasonable manner. Even if it is poorly (and that is an understatement) documented, there is at the very least some open-source portions of the OS that allows the actual code to be examined for how it really works (which is totally different that what any of the documentation says). At least in this regard Linux is the one that is way behind Mac OS and Windows.

Posted by Allen Bauer on January 29th, 2010 under Delphi, General, Work | Comment now »


Requiem for the {$STRINGCHECKS xx} directive…

It’s time. It’s time to say goodbye to the extra behind-the-scenes codegen and overhead that was brought to us during the Ansi->Unicode transition. We’ve shipped two versions with this directive on by default. The Ansi world is now behind us. It’s only real purpose in life was to assist C++Builder customers to more easily transition to C++Builder 2009 and 2010. There are some rare cases where an event handler that was declared in a C++ form/datamodule with an AnsiString parameter *could* be called with the AnsiString parameter containing a UnicodeString payload. To guard against this case, since there was no way to detect this at runtime, was to be resilient to it. Agree or not, that was what happened.

As we’re working on Fulcrum, the next RAD Studio release with a focus on cross-compilation for Mac and Linux, we saw no need to burden these platforms with something that was designed for easing a transition that has not occurred on those platforms. I can say with certainty that the Mac and Linux targets will not be encumbered with this extra overhead. It is also highly likely that even the Windows target for the next release will also no longer support this directive. The source level directive will still be accepted without error, it will merely have no effect.

This will potentially affect C++Builder customers, but as long as the transition to Unicode has been made, it will have no affect. In the rare cases that this will affect some our customer still maintaining old projects that have not updated any event handlers that too AnsiString parameters to take UnicodeString params as appropriate. We hope to at least provide a warning when opening up the form or data module in the IDE to alert the user of this case. More details as we figure this stuff out.

So the bottom line is that we’re intentionally shedding this extra baggage and will not be foisting it on the Mac and Linux platforms, and most likely Windows will maintain parity. I’m already hearing a few cheers and shouts from the back…coupled with a few merely going “Huh? What is he talking about?” Just know that if you’re a pure Delphi guy, there is really nothing to see here… except that you may see a slight performance boost in your code… that that is, as I understand it, a good thing ;-).

So, {$STRINGCHECKS xx}, don’t let the door hit you on the way out…

  • Preemptive snarky comments (with equally snarky responses ;-):
    • I cannot understand why you did this in the first place!
      • We all knew going in that it was sub-optimal. A lot of the other suggestions and solutions we had were far worse. Ignoring the problem wasn’t an option.
    • I only use Delphi and don’t give a rip about the C++ folks.
      • We care about them a lot. They give us money.
    • Finally! I’ve been telling you this ever since D2009!
      • Thank you, Capt. Obvious.
    • I have D2010, are you going to issue a patch for this?
      • Uh, no sorry. D2009 and D2010 are still in the thick of the transition. Folks are still moving to these versions.
    • Get a spellchecker! You must be an incompetent dolt since you misspelled ‘xxxx’
      • Yeah, I know, you’re perfect and don’t make any mistakes. Yep, I suck.
Posted by Allen Bauer on January 26th, 2010 under CodeGear, Delphi, Work | 14 Comments »


Divided and Confused

Odd discovery of the day. Execute the following on a system running a 32-bit version of Windows (NOT a Win64 system!):

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils;

begin
  try
    MSecsToTimeStamp(-1);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

It should print out the following:

EIntOverflow: Integer overflow

Now run the exact same (32bit) binary on a Win64 system, this is what you’ll get:

EDivByZero: Division by zero

Give it a try. Weird, no? So I set out to figure out what is going on. MSecsToTimeStamp() is implemented down in SysUtils and is implemented in assembler code:

function MSecsToTimeStamp(MSecs: Comp): TTimeStamp;
asm
        PUSH    EBX
        XOR     EBX,EBX
        MOV     ECX,EAX
        MOV     EAX,MSecs.Integer[0]
        MOV     EDX,MSecs.Integer[4]
        DIV     [EBX].IMSecsPerDay
        MOV     [ECX].TTimeStamp.Time,EDX
        MOV     [ECX].TTimeStamp.Date,EAX
        POP     EBX
end;

Notice the DIV instruction. That’s where all this funny business takes place. In this particular test case, the divisor is most certainly not 0 and according to the Intel documentation, if the quotient is too large to fit in EAX, then a #DE (Divide Error) exception is raised. If the divisor is 0, then the same exception is raised. Clearly, Windows 32bit is some how figuring out the difference while Windows 64bit doesn’t bother. It is probably by disassembling the instruction and inspecting the machine state, which seems a little dangerous in the context of multicore CPUs. For instance, in the above case, the divisor is in a memory location. If another CPU were to modify that memory location before the exception was processed, the OS may report a different status from what actually happened. Yet, from what I’ve been able to find out, there is no sure way to know whether or not #DE happened from a real divide by zero or an overflow condition.

This took me a while to track down. It wasn’t until one engineer tried it and go the “expected” EIntOverflow and another tried it and got the EDivByZero. The only difference was that of the “bitness” of the OS (32bit vs. 64bit).

Posted by Allen Bauer on January 25th, 2010 under CodeGear, Delphi, General, Work | 5 Comments »


Mac OS Stack alignment – What you may need to know

While I let my little tirade continue to simmer, I figured many folks’ next question will be “Ok, so there may be something here that affects me. What do I need to do?” Let’s first qualify who this will affect. If you fall into the following categories, then read on:

  • I like to use the built-in assembler to produce some wicked-fast optimized functions (propeller beanie worn at all times…)
  • I have to maintain a library of functions that contain bits of assembler
  • I like to take apart my brand new gadgets just to see what makes them tick (Does my new 802.11n router use an ARM or MIPS CPU?)
  • My brain hasn’t been melted in a while and I thought this would be fun
  • I want to feel better about myself because I don’t really have to think about these kinds of things

Let’s start off with a simple example. Here’s an excerpt of code that lives down in the System.pas unit:

function _GetMem(Size: Integer): Pointer;
asm
        TEST    EAX,EAX
        JLE     @@negativeorzerosize
        CALL    MemoryManager.GetMem
        TEST    EAX,EAX
        JZ      @@getmemerror
        REP     RET // Optimization for branch prediction
@@getmemerror:
        MOV     AL,reOutOfMemory
        JMP     Error
@@negativeorzerosize:
        XOR     EAX, EAX
        DB      $F3
end;

Notice the CALL MemoryManager.GetMem instruction. Due to the nature of what that call does, we know that it is very likely that a system call could be made so we’re going to have to ensure the stack is aligned according to the Mac OS ABI. Here’s that function again with the requisite changes:

function _GetMem(Size: Integer): Pointer;
asm
        TEST    EAX,EAX
        JLE     @@negativeorzerosize
{$IFDEF ALIGN_STACK}
        SUB     ESP, 12
{$ENDIF ALIGN_STACK}
        CALL    MemoryManager.GetMem
{$IFDEF ALIGN_STACK}
        ADD     ESP, 12
{$ENDIF ALIGN_STACK}
        TEST    EAX,EAX
        JZ      @@getmemerror
        REP     RET // Optimization for branch prediction
@@getmemerror:
        MOV     AL,reOutOfMemory
        JMP     Error
@@negativeorzerosize:
        XOR     EAX, EAX
        DB      $F3
end;

When compiling for the Mac OS, the compiler will define ALIGN_STACK so you know that this compile target requires stack alignment. So how did we come up with the value ‘12′ in which to adjust the stack. If you remember from my previous article, we know that upon entry to the function, the value in ESP should be $xxxxxxxC. Couple that with the fact that up until the actual call, we’ve done nothing to change the value of ESP, we know where the stack is in the alignment. Since the stack always grows down in memory (toward lower value addresses), we need change ESP to $xxxxxxx0 by subtracting $C, which is 12 decimal. Now the call can be made and we’ll know that upon entry to MemoryManager.GetMem, ESP, once again, will be $xxxxxxxC.

That was a relatively trivial example since there was only one call out to an function that may make a system call. Consider a case where MemoryManager.GetMem was just a black-box call and you had no clue what it would actually do. You cannot ever be certain that any given call will not lead to a system call, so the stack needs to be aligned to a known point just in case the system is called.

Another point I need to make here is that if the call goes across a shared library boundary,  even if the shared library is also written in Delphi, you will be making a system call the first time it is invoked. This is because all function imports, like Linux, are late-bound. Upon the first call to the external reference, it will go through the dynamic linker/loader that will resolve the address of the function and back-patch the call site so that the next call goes directly to the imported function.

What happens if the stack is misaligned? This is the most insidious thing about all this. There are only certain points where the stack is actually checked for proper alignment. The just mentioned case where you’re making a cross-library call is probably the most likely place this will be encountered. One of the first things the dynamic linker/loader does is to verify stack alignment. If the stack is not aligned properly, then a mach EXC_BAD_ACCESS exception is thrown (this is different than how exceptions are done in Windows, see Eli’s post related to exception handling). The problem is that the stack alignment could have been thrown off by one function, hundreds of calls back along the call chain! That is really “fun” to track down where it first got misaligned.

Suppose the function above now had a stack frame? What would the align value be in that case? The typical stack frame, assuming no stack-based local variables, would look like this:


function MyAsmFunction: Integer;
asm
        PUSH    EBP
        MOV     EBP,ESP
        { Body of function }
        POP     EBP
end;

In this case the stack pointer (ESP) will contain the value, $xxxxxxx8 which is 4 bytes for the return address and 4 bytes for the saved value of EBP. If no other stack changes are made, surrounding any CALL instruction, assuming you’re not pushing arguments onto the stack which we’ll cover in a moment, there would be a SUB ESP,8 and ADD ESP,8 instead of the previous 12.

Now, this is where it gets complicated, which clearly demonstrates why compilers are pretty good at this sort of thing. What if you wanted to call a function from assembler that expected all the arguments too be passed on the stack? Remember that at the call site (ie. just prior to the CALL instruction), the stack must be aligned to a 16 byte boundary and contain $xxxxxxx0. In this case you cannot simply push the parameters on the stack and then do the alignment. You must now align the stack before pushing parameters onto it knowing how the stack will be aligned after all the parameters are pushed. So if I need to push 2 DWORD parameters onto the stack and the current ESP value is $xxxxxxxC, you need to adjust the stack by 4 bytes (SUB ESP,4). ESP will now contain $xxxxxxx8. Then push the two parameters onto the stack which adjusts ESP to $xxxxxxx0, and we’ve satisfied the alignment criterion.

If the previous example had required 3 DWORDS, then no adjustment of the stack would be needed since after pushing 3 DWORDS(that’s 12 bytes), the stack would have been $xxxxxxx0, and we’re aligned. Likewise, if the above example had required 4 DWORD to be pushed, then now we’re literally “wasting” 12 extra bytes of stack. because 4 DWORDS is 16 bytes, that block of data will naturally align, so we have to start pushing the parameters on a 16 byte boundary. That means we’re back to adjusting the stack by the full 12 bytes, pushing 16 bytes onto the stack and then making the call. For a function call taking 16 bytes, we’re actually using 28 bytes of stack space instead of only 16! Add in stack-based local variables and you can see how complicated this can quickly get.

Remember, this is also happening behind the scenes within all your Delphi code. The compiler is constantly keeping track of how the stack is being modified as the code is generated. It then uses this information to know how to generate the proper SUB ESP,ADD ESP instructions. This could mean that code that was deeply recursive that worked fine on Windows, would now possibly blow out the stack on the Mac OS! Yes, this is admittedly a rare case since stacks tend to be fairly large (1MB or more), but it is still something to consider. Consider changing your recursive algorithm to iterative instead in order to keep the stack shallower and cleaner.

You should really consider whether or not your hand-coded assembler function needs to be in assembler and if it would work just as well if it were in Pascal. We’re evaluating this very thing, especially for functions that are not used as often or have been assembly merely due to historical reasons. Like you, we also understand that there is a clear benefit to having a nice hand-optimized bit of assembler. For instance, the Move() function in the System unit was painstakingly optimized by members of the FastCode project. Everyone clearly benefits from the optimization that function provided since it is heavily used throughout the RTL itself, but also by many, many users. Note here that the Move() function required no stack alignment changes since it makes no calls outside its own block of code, so it is just as fast and optimized as before. It runs unchanged on all (x82-32bit) platforms.

Posted by Allen Bauer on January 15th, 2010 under Delphi, General, Work | 9 Comments »


It’s my stack frame, I don’t care about your stack frame!

I’m going to start off the new year with a rant, or to put it better, a tirade. When targeting a new platform, OS, or architecture, there will always be gotchas and unforeseen idiosyncrasies about that platform that you now have to account for. Sometimes they are minor little nits that don’t really matter. Other times they can be of the “Holy crap! You have got to be kidding me!” Then there are are the “Huh!? What were they thinking!?” kind. For the Mac OS running on 32bit x86 hardware, which is what we’ll be supporting initially while we’re still getting our x64 compiler online, we encountered just that sort of thing. I’m probably going to embarrass myself here. I’ve worked with the x86 CPU architecture for at least 25 years, so I would hope that I’ve got a pretty good handle on it. I also work with a bunch of really smart people that have the same, if not more experience, with the x86 and and other RISC/CISC architectures. These are not just folks that have worked with those architectures, but create compilers and tools for them, including back-ends that have to generate machine instructions. I don’t know, but I’d venture a guess that you have to have more than a cursory knowledge of the given CPU architecture.

So what did we find that would prompt me to go on a tirade? It seems that the MacOS ABI* requires that just prior to executing a CALL instruction, we must ensure that the stack is aligned to a 16byte (Quad DWORD) boundary. This means that when control is transferred to a given function, the value in ESP is always going to be $xxxxxxxC. Prior to the call, it must be $xxxxxxx0. Note that Windows or even Linux doesn’t have this requirement. OK? The next question is “Why!?” Let’s examine several potential scenarios or explanations. I, along with my colleagues here at Embarcadero (and even past Borland/CodeGear colleagues that now literally work at Apple) have yet to have this explained to our satisfaction. We even know one that even works at Apple on the Cocoa R&D team in the OS group! Our own Chris Bensen, has even visited these friends for lunch at Apple and posed the question.

By now you’re either thinking I’ve gone around the bend, or what is the big deal? Others are probably thinking, “Well that makes sense because modern CPUs work better if the stack is aligned like that.” Here are some various reasons we’ve both come up with ourselves and explanations we’ve been given. They all tend to be variations on a theme but none have truly been satisfactory.  Why burden every function in the system to adhere to this requirement for some (in the grand scheme) lesser used instructions.

“The Mac OS uses a lot of SSE instructions”

Yes, there are SSE instructions that do require that all memory data be aligned on 16 byte boundaries. I know that. I also know that many CPU caches are 16 bytes wide. However, unless you are actually using an SSE instruction (and face it, most functions will probably never actually use SSE instructions). What I do know about alignments is that for a given machine data size (1, 2, 4, 8, 16 bytes), they should always be aligned to their own natural boundary for maximum performance. This also ensures that a memory access doesn’t cross a cache line, which is certainly more expensive.

But why does my function have to make sure your stack is aligned? What I mean is that if a compiler (or even hand coded assembly) needs to have some local variable or parameter aligned on the stack, why doesn’t the target function ensure that? I refer you to the title of this post for my feeling on this one. If you need it aligned, then align it yourself.

“The Mac OS intermixes 64bit and 32bit code all over the place”

I’ve heard this one a lot. Yes, x86-64 does have stricter data alignment requirements, but intermixing of the code? Does it? Really? Not within a given process, AFAIK. When you call into the OS kernel, the CPU mode is switched. Due to the design of 64bit capable CPUs, you cannot really execute 64bit code and 32bit code within the same process. And even if the kernel call did cause a mode switch and used the same stack, I again, refer you to the title of this post. Admittedly, I don’t know all the gory details of how the Mac OS handles these 32bit<->64bit transitions. I would imagine that they would have to go through a call gate since the ring has to change along with the “bitness” mode. This will also typically cause a switch to a different “kernel stack” which would also copy a given number of items from the user’s stack. This is all part of the call descriptor.

“It simplifies the code gen”

I see. So having to inject extra CPU instructions at each call site to ensure that the stack is aligned properly is simpler!? You could argue that the compiler’s code generator has to keep track of the stack depth and offsets anyway, so this is minimal extra overhead. But that’s my point, why burden every function with this level of housekeeping when it is not necessary for the current function to operate correctly?

“Maybe it was because the Mac OS X initially targeted the PowerPC?”

When Mac OS X was first introduced, the whole Macintosh line of computers had already transitioned from the Motorola 680xx line of CPUs to the PowerPC RISC CPU. When Apple decided to completely switch all its hardware over to the Intel x86 and x86-64 architectures, it is possible (and unless I can find information to the contrary) and indeed probable, that the insular nature of the Apple culture directly lead to a vast misunderstanding of this aspect of the 32bit x86 architecture. Failure to actually look at other very successful Intel x86 operating systems and architectures, such as, oh.. I don’t know…  Windows and Linux?

I guess the frustrating thing about all this is that 32bit x86 code generated for the Mac will have extra overhead that is clearly not necessary or even desired for other platforms, such as Windows or Linux. This is like requiring all my neighbors to keep my house clean. Sure, if your compiler is going to do some kind of global or profile guided optimization, you may want to do more stack manipulations through out the application. But that is a highly specialized and rare case, and AFAIK, the tools on the platform don’t do anything like that (GCC, ObjectiveC).

When we first found out about this requirement among the dearth of documentation on these low-level details (I’m sorry, but Apple’s overall OS documentation pales in comparison to Windows’ or even Linux’s. Mac fans, flame on ;-), I posed a question to Stack Overflow figuring that since there is such a wide range of experts out there, that surely a clearly satisfactory explanation would be available in short order. I was wrong. That question has been up there for over 9 months and I still get up-votes periodically.

Even though there is one answer I selected, it doesn’t seem that the Mac ABI even adheres to the Intel doc! "It is important to ensure that the stack frame is aligned to a 16-byte boundary upon function entry to keep local __m128 data, parameters, and XMM register spill locations aligned throughout a function invocation." It says, “upon function entry” yet this isn’t the case. The Mac ABI requires the alignment to be aligned at the call site, and not on function entry! Remember that the CALL instruction will automatically push the return address onto the stack, which for x86-32 is a 32bit (DWORD) sized. Why isn’t the stack merely prepared to be aligned at the call site so that when the return address is pushed, the stack is now aligned. This would mean that ESP would be $xxxxxxx4 at the call site and $xxxxxxx0 upon entry. It is also possible interpret this statement that the function prologue code is what is responsible for doing this alignment, and not necessarily the caller. This would clearly jive with the title of this post.

So there you have, a long rambling diatribe. Why does this even matter if the compiler just handles it? Because we’re having to go through all our optimized, hand coded assembly code and make sure it keeps the stack properly aligned. It also means that for all our customers out there that also like to dabble in hand coding assembler will need to now take this into account. This coupled with the re-emergence of Position Independent Code (PIC), we’re having a jolly old time… Let the ensuing discussion begin… I’m really interested in knowing the actual reasons for this requirement… I mean could the really smart OS and Tools folks at Apple gotten this all wrong? I really have a hard time believing that because you’d think that someone would have caught it. Yet, seemingly obvious software bugs sneak out all the time (yeah, yeah, we’re just as guilty here, you got me on that one!).

Pre-emptive snarky comment: Well, if you had a compiler that did better optimizations you wouldn’t have these problems! Thank you, Captain Oblivious! Our code generator is a trade-off between decent optimizations and very fast codegen. We only do the biggest-bang-for-the-buck optimizations. We’ve even added some optimizations in recent history, like function inlining.

*ABI – Application Binary Interface. This is different from an API (Application Programming Interface), which only defines the programming model of a group of provided functions. An ABI defines the raw mechanics of how an API is implemented. An API may define a calling convention that is used, whereas the ABI defines exactly how that convention is to actually work. This includes things like how and where are the parameters actually passed to the function. It also defines which CPU registers must be preserved, which ones are considered “working” registers and can be overwritten, and so on.

Posted by Allen Bauer on January 14th, 2010 under Delphi, General, Work | 15 Comments »


Server Response from: BLOGS2