x64 assembler fun-facts
While implementing the x64 built-in assembler for Delphi 64bit, I got to “know” the AMD64/EM64T architecture a lot more. The good thing about the x64 architecture is that it really builds on the existing instruction format and design. However, unlike the move from 16bit to 32bit where most existing instruction encodings were automatically promoted to using 32bit arguments, the x64 design takes a different approach.
One myth about the x64 instructions is that “everything’s wider.” That’s not the case. In fact many addressing modes which were taken as absolute addresses (actually offsets within a segment, but the segments are 4G in 32bit), are actually now 32bit relative offsets now. There are very few addressing modes which use a full 64bit absolute address. Most addressing modes are 32bit offsets relative to one of the 64bit registers. One interesting addressing mode that is “implied” in many instruction encodings is the notion of RIP-relative addressing. RIP, is the 64bit equivalent of the 32bit EIP, or 16bit IP, or Instruction Pointer. This represents from which address the CPU will fetch the next instruction for execution. Most hard-coded addresses within many instructions are now relative offsets from the current RIP register. This is probably the biggest thing you have to wrap your head around when moving from 32bit assembler.
Even though many instructions will implicitly use the RIP-relative addressing mode, there are some instruction addressing modes that continue to use a 32bit offset, and are not RIP-relative. This can really bite you when doing simple mechanical translations from 32bit to 64bit. These are the SIB form with a 32bit (or even 8bit) offset. What can happen is that you end up forming an address that can only address 32bits, and is thus limited to addressing items below the 4G boundary! And this is a perfectly legal instruction! To demonstration this, consider the following 32bit assembler that we’ll translate to 64bits.
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOV AX,[EAX * 2 + TestArray]
end;
Let’s now translate this for use in 64bit using a simple mechanical translation.
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
MOV AX,[RAX * 2 + TestArray]
end;
Pretty straight forward, right? Not so fast there partner. Let’s see; I know that I need to use a full 64bit register for the offset but since Integer is still 32bits, I need to “sign-extend” it to 64bits. The venerable MOVSX (Move with sign extension) instruction “promotes” the signed 32bit offset to 64bits while preserving the sign. Nope, that’s not a problem. The only thing I changed in the next instruction was EAX to RAX, so how could that be a problem? Well, when you compile this code you’ll get a rather strange error message:
[DCC Error] Project7.dpr(18): E2577 Assembler instruction requires a 32bit absolute address fixup which is invalid for 64bit
Huh? Remember the little note above about the SIB instruction form? Because the RAX (or EAX in 32bit) register is being scaled (the * 2), this instruction must use the SIB (Scale-Index-Base) instruction form. When using the SIB form RIP isn’t considered when calculating the actual address. Additionally, the offset encoded in the instruction can still only be 8 or 32bits. No 64bit offsets.
In 32bit, the compiler would generate a “fixup” to ensure that the encoding of the instruction offset field to the global “TestArray” variable was properly “fixed up” at runtime should the image happened to be relocated to another address. This is a 32bit absolute address. The 64bit version of this instruction, while actually a truly valid instruction, would only have 32bits in which to place the address of “TestArray.” The “fixup” generated would have to remain 32bit. This could lead to creating an image that were it ever relocated above the 4G boundary, would likely crash at best or read the wrong memory address at worst!
Ok, so now what? There is a SIB form that we can use to work around this problem, but it requires burning another register. The good news is that we now have another 8 registers with which to work. So if you have a rather complicated chunk of 32bit assembler code that burns up all the existing usable 32bit registers, you now have another group of registers that can help solve this problem without having to rework the code even more. So here’s how to fix this for 64 bit:
var
TestArray: array[0..255] of Word;
function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
LEA R10,[TestArray]
MOV AX,[RAX * 2 + R10]
end;
Here, I used the volatile R10 register (R8 an R9 are used for parameter passing) to get the absolute address of TestArray using the LEA instruction. While the “address” portion of this instruction is still 32bits, it is taken as RIP-relative. In other words, this value is the “distance” from the next instruction to the variable TestArray in memory. After this instruction, R10 now contains a true 64bit address of the TestArray variable. I must still use the SIB form in the next instruction, but instead of a hard-coded “offset” I use the value in R10. Yes, there is still an implicit offset of 0, which uses the 8bit offset form.
You can see that mindless, mechanical translations of assembler code is likely to cause you some grief due to some of the subtle changes in instruction behaviors. For this very reason, we strongly recommend you use all Object Pascal code instead of resorting to assembler when possible. This will not only better ensure that your code will more likely move unchanged to other processor architectures (think ARM here folks), but you’ll not have to worry about such assembler gotchas in the future. If you’re using assembler code because “it’s faster,” I would encourage you to look closely at the algorithm used. There are many cases where the proper algorithm written in Object Pascal will yield greater gains than a simple translation to assembler using the same algorithm. Yes there are some things which you simply must do in assembler (strange, off-beat calling conventions, “LOCK” instructions for concurrency, etc…), but I would contend that many assembler functions can be moved back to Object Pascal with little impact on performance.
Share This | Email this page to a friend
Posted by Allen Bauer on October 5th, 2011 under 64bit, Delphi, General |



October 7th, 2011 at 1:47 am
Interesting!
October 7th, 2011 at 11:58 am
think ARM??
Arm programmers have been trained since 1988 to think as addresses relative to the PC (R15), or any other register. For x86 people this is a gotcha, for ARM people the AMD64 resembles much more to their paradigm
October 7th, 2011 at 2:36 pm
Oma,
I’m taking it you didn’t actually read the paragraph in which that comment was made. I wasn’t referring to assembly code, but rather to the fact that we recommend you *not* use assembly code and use pure Pascal code in order to better be prepared for future platforms and CPU architectures.
October 7th, 2011 at 6:16 pm
Anybody who’s ever written any ROM-based code understands the value of IP-relative addressing (a.k.a. PIC, or Position Independent Coding). Most ARM CPUs are targeted at embedded applications, meaning the code resides in ROM.
But there are plenty of x86 applications that are ROM-based as well. There’s nothing about the x86 architecture that prevents you from employing PIC solutions.
October 10th, 2011 at 9:24 am
@David,
Of course. The Mac OSX targeting x86 compilers generates PIC. However, since the x86 architecture doesn’t allow for IP-relative addressing, another register, EBX, is burned for this purpose. x64 with IP-relative addressing, certainly makes this far more easier for the code generator.
December 30th, 2011 at 9:45 pm
TQYINCHENMSYXXUE
Welcome to Oakley Sunglasses Hut to buy cool and cheap oakleys sunglasses.
January 25th, 2012 at 8:22 am
rmine the multi-faceted in order to determine the true and false, but one
or two bad judgments, for which we will combine the physical picture in the near future, the introduction of Some articles give you more skills and methods of
identification. 2011 New No
March 16th, 2012 at 10:40 pm
Fake Oakley sunglasses blend fashion and technology perfectly
April 15th, 2012 at 11:48 pm
Yet he for you to confirm everyone incorrect
July 9th, 2012 at 4:14 am
Most hard-coded addresses within many instructions are now relative offsets from the current RIP register.
November 21st, 2012 at 1:32 am
We are designed from the ground up to match customers with lender’s who provide an alternativeinstallment loan option to those already in the market
November 29th, 2012 at 5:02 am
After reading a couple of the post on your website these few days, and I heartily like your style of blogging. I make you sure that i will tag it to my favorites internet site list.
November 30th, 2012 at 1:24 am
We generally acquaint with anniversary other, encouraged anniversary added and apprentice from anniversary other. How to acknowledge you? I’m aback abounding comments
January 10th, 2013 at 3:43 am
thnx for sharing this fantastic website.