Delphi XE2 will be published this year. What are the key features of this new release? (Is this the release named "Pulsar"?)
Customers will now be able to target Windows 32bit, Windows 64bit, and Mac OSX 32bit. XE2 introduces a new cross-platform GUI-centric, GPU accelerated component framework called, FireMonkey. VCL also received an extensive upgrade with the introduction of Styles. New in XE2 is LiveBindings. This provides a powerful and flexible system that allows binding any kind of data source to any property or properties. The data source can be nearly anything, including other properties.
There will be a new framework called FireMonkey. Can you tell us, how FireMonkey works and what’s is job?
FireMonkey is designed from the ground up to be cross-platform. It, by design, isolates all platform specifics into an independent platform layer. While FireMonkey extensively uses components, how it actually renders to the GUI is significantly different from VCL. While VCL uses independent, self-contained components that all render using their own techniques or even wrap existing Windows controls, FireMonkey manages the display of content using compositing. This allows for significantly more flexibility in GUI-design. Animation is built into the framework in order to allow very interactive and advanced user interactions. Like animation, filters and transforms are also built in which allow the who UI of portions thereof to be manipulated. For instance, a small modal popup could be displayed and rather than merely disabling the main UI, you could apply a blurring effect to the UI behind the modal popup giving it more depth of field. This blurring effect is applied while compositing the UI and is independent of any rendering of the components/controls.
Is FireMonkey a replacement for the VCL or an addition?
VCL was first and foremost designed to be a relatively thin wrapper to make Windows programming simpler and more accessible. VCL effectively embraced many Windows programming concepts and made them intrinsic to the framework. This certainly made Windows programming a far more productive and pleasant experience. It also inextricably tied VCL to the Windows platform and all its unique characteristics. We had several goals with FireMonkey. First of all we wanted a framework that allowed for the creation of very rich, interactive, modern UIs. We also wanted a framework that wasn’t hog-tied to a given platform. FireMonkey is not intended as a replacement for VCL; rather it is intended as a whole new way for customers to embrace the emerging market for richer, more interactive desktop applications along with the burgeoning mobile space.
If I want to run an existing Delphi application under Mac OS X. Do I have to convert it to FireMonkey first? Will there be a converter?
VCL and FireMonkey share common RTL and database components such as dbXpress and DataSnap. While you will not be able to simply recompile your VCL based application for Mac OSX, you will be able to take all your code which exclusively uses the RTL and DB components. As for converters, I know that at the time of this writing there are several third-parties offering VCL->FireMonkey converter products.
What are your future plans for FireMonkey?
More platforms and mobile. FireMonkey is how we’re keeping relevant in the emerging heterogeneous mobile and desktop platform world currently emerging. Throughout most of the ’90s and early ’00s, the mobile computing space was non-existent or very niche. Apple and the Mac OS were actually in the decline and many weren’t sure they’d be around to see 2000. What a different world we’re in now. The desktop Mac OSX is making significant inroads into the enterprise, and the mobile space is anything by niche. Tying Delphi strictly to the Windows platforms ignores huge opportunities for both Embarcadero and all our Delphi customers, new and old. With FireMonkey, XE2 is positioned to be the only /native/ cross-platform framework that targets both major desktop operating systems and one of the dominant mobile operating systems, iOS. Expect to see FireMonkey become more powerful and even easier to use and target even more mobile platforms in future releases.
The applications cross compiled for OS X are native. Is there the new Delphi compiler on duty? And will it be used for "normal" Win32 applications in future?
There are three new compilers introduced with XE2. Delphi Windows 64bit, Delphi Mac OSX 32bit, and C++ Mac OSX 32bit. All of these compilers are derived from the existing codebase. They all essentially share the same respective "front-ends", the part of the compiler that translates the source-code into an intermediate form in preparation for generating machine code. The existing 32bit Delphi and 32bit C++ compilers are still very much in business. We have some research projects in progress for targeting even more platforms and CPU architectures.
If new compiler: Is the new compiler fully downwards compatible? Or are there some functions abandoned?
For XE2, the current compilers were employed in order to ensure maximum backward compatibility. Looking to the future, we’re currently researching new directions for both a compiler architecture which allows for quicker targeting of new architectures and looking at adding more advanced, and even more modern language features. This may mean eschewing some older features of the language.
Are there some new Features in Delphi XE2 for people who will only develop VCL-Win32-Applications?
As evidenced by XE2, VCL is still very much a key part of the product. With the addition of Styles, the programmer can take their existing VCL based applications and update and modernize the look and feel by using the new Style engine. The third-party component support remains one of, if not the best for all independent development tools on the market. VCL is still the fastest and easiest way to develop*Windows* applications. Also, with XE2 and now being able to target 64bit Windows, most VCL applications can now be merely recompiled for 64bit, subject to the normal 32bit->64bit caveats.
Will there be a new Starter edition again? And do you have any plans for a free Delphi (for getting more new blood in the Delphi community)?
Starter edition is very much a key part of our product line. When you compare the price point of the Starter edition taking account of inflation with the price of the original Turbo Pascal coupled with the vastly superior capabilities of Starter compare to Turbo Pascal, I think you get far more value than the price. We also have very competitive offerings for the educational markets, where one can get nearly 80-90% off of all the products. As for a free edition, we’re always looking at ways to grow the community base without the potential for harming our existing, very strong and growing market. At this point we feel that the Starter edition provides a good balance of price, capabilities and value. Starter is positioned directly at the new customer by including features that most new customers would need right away to in order to both learn the environment and begin to develop commercial applications.Posted by Allen Bauer on October 14th, 2011 under CodeGear | 22 Comments »
The Windows x64 ABI (Application Binary Interface) presents some new challenges for assembly programming that don’t exist for x86. A couple of the changes that must be taken into account can can be seen as very positive. First of all, there is now one and only one OS specified calling convention. We certainly could have devised our own calling convention like in x86 where it is a register-based convention, however since the system calling convention was already register based, that would have been an unnecessary complication. The other significant change is that the stack must always remain aligned on 16 byte boundaries. This seems a little onerous at first, but I’ll explain how and why it’s necessary along how it can actually make calling other functions from assembly code more efficient and sometimes even faster than x86. For a detailed description of the calling convention, register usage and reservations, etc… please see this. Another thing that I’ll discuss is exceptions and why all of this is necessary.
For an given function there are three parts we’re going to talk about, the prolog, body, and epilog. The prologue and epilogue contain all the setup and tear-down of the function’s “frame”. The prolog is where all the space on the stack is reserved for local variables and, different from how the x86 compiler works, the space for the maximum number of parameter space needed for all the function calls within the body. The epilog does the reverse and releases the reserved stack space just prior to returning to the caller. The body of a function is where the user’s code is placed, either in Pascal, or as we’ll see this is where your assembler code you write will go.
You may be wondering why the prolog is reserving parameter space in addition to the space needed for local variables. Why not just push the parameters on the stack right before calling a function? While there is technically nothing keeping the compiler from placing parameters for a function call on the stack immediately before a call, this will have the effect of making the exception tables larger. As I mentioned above, exceptions in x64 are not implemented the same as in x86, which was a stack-based linked list of records. In x64, exceptions are done using extra data generated by the compiler that describes the stack changes for a given function and where the handlers/finally blocks are located. By only modifying the stack within the prolog and epilog, “unwinding” the stack is easier and more accurate. Another side benefit is that when passing stack parameters to functions, the space is already available so the data merely needs to be “MOV”ed onto the stack without the need for a PUSH. The stack also remains properly aligned, so no extra finagling of the RSP register is necessary.
Delphi for Windows 64bit introduced several new assembler directives or “pseudo-instructions”, .NOFRAME, .PARAMS, .PUSHNV, and .SAVENV. These directives allow you to control how the compiler sets up the context frame and ensures that the proper exception table information is generated.
Some functions never make calls to other functions. These are called “leaf” functions because the don’t do any further “branching” out to other functions, so like a tree, they represent the “leaf” For functions such as this, having a full stack frame may be extra overhead you want eliminate. While the compiler does try and eliminate the stack frame if it can, there are times that it simply cannot automatically figure this out. If you are certain a frame is unnecessary, you can use this directive as a hint to the compiler.
.PARAMS <max params>
This one may be a little confusing because it does not refer to the parameters passed into the current function, rather this directive should be placed near the top of the function (preferably before any actual CPU instructions) with a single ordinal parameter to tell the compiler what the maximum number of parameters will be needed for all the function calls within the body. This will allow the compiler to properly reserve extra, properly aligned, stack space for passing parameters to other functions. This number should reflect the maximum number of parameters for all functions and should include even those parameters that are passed in registers. If you’re going to call a function that takes 6 parameters, then you should use “.PARAMS 6”.
When you use the .PARAMS directive, a pseudo-variable @Params becomes available to simplify passing parameters to other functions. It’s fairly easy to load up a few registers and make a call, but the x64 calling convention also requires that callers reserve space on the stack even for register parameters. The .PARAMS directive ensures this is the case, so you should still use the .PARAMS directive even if you’re going to call a function in which all parameters are passed in registers. You use the @Params pseudo-variable as an array, where the first parameter is at index 0. You generally don’t actually use the first 4 array elements since those must be passed in registers, so you’ll start at parameter index 4.
The default element size is the register size of 64bits, so if you want to pass a smaller value, you’ll need a cast or size override such as “DWORD PTR @Params”, or “ @Params.Byte”. Using the @Params pseudo-variable will save the programmer from having to manually calculate the offsets based on alignments and local variables. UPDATE: I foobar’ed that one… The @Params array is an array of bytes, which allows you to address every byte of the parameters. Each parameter takes up 8 bytes (64bits), so you’ll need to scale accordingly to access each parameter. Casting or size overrides are still necessary. The above bad example should have been: “DWORD PTR @Params[4*8]” or “ @Params[4*8].Byte”. Sorry about that.
.PUSHNV <GPReg>, .SAVENV <XMMReg>
According to the x64 calling convention and register usage spec, there are some registers which are considered non-volatile. This means that certain registers are guaranteed to have the same value after a function call as it had before the function call. This doesn’t mean this register is not available for usage, it just means the called function must ensure it is properly preserved and restored. The best place to preserve the value is on the stack, but that means space should be reserved for it. These directives provide both the function of ensuring the compiler includes space for the register in the generated prolog code and actually places the register’s value in that reserved location. It also ensures that the function epilog properly restores the register before cleaning up the local frame. .PUSHNV works with the 64bit general purpose registers RAX…R15 and .SAVENV works with the 128bit XMM0..XMM15 SSE2 registers. See the above link for a description of which registers are considered non-volatile. Even though you can specify any register, volatile or non-volatile as a parameter to these directives, only those registers which are actually non-volatile will be preserved. For instance, .PUSHNV R11 will assemble just fine, but no changes to the frame will be made. Whereas, .PUSHNV R12 will place a PUSH R12 instruction right after the PUSH RBP instruction in the prolog. The compiler will also continue to ensure that the stack remains aligned. Remember when I talked about why the stack must remain 16byte aligned? One key reason is that many SSE2 instructions which operate on 128bit memory entities require that the memory access be aligned on a 16byte boundary. Because the compiler ensures this is the case, the space reserved by the .SAVENV directive is guaranteed to be 16byte aligned.
Writing assembler code in the new x64 world can be daunting and frustrating due to the very strict requirements on stack alignment and exception meta-data. By using the above directives, you are signaling your intentions to the one thing that is pretty darn good at ensuring all those requirements are met; the compiler. You should always ensure the directives are placed at the top of the assembler function body before any actual CPU instructions. This makes sure the compiler has all the information and everything is already calculated for when it begins to see the actual CPU instructions and needs to know what the offset from RBP where that local variable is located. Also, by ensuring that all stack manipulations happen within the prolog and epilog, the system will be able to properly “unwind” the stack past a properly written assembler function. Without this data, the OS unwind process could become lost and at worst, skip exception handlers, or at worst call the wrong one and lead to further corruption. If the unwind process gets lost enough, the OS may simply kill the process without any warning, similar to what stack overflows do in 32bit (and 64bit).Posted by Allen Bauer on October 10th, 2011 under 64bit, CodeGear, Delphi, General, Work | 18 Comments »
While implementing the x64 built-in assembler for Delphi 64bit, I got to “know” the AMD64/EM64T architecture a lot more. The good thing about the x64 architecture is that it really builds on the existing instruction format and design. However, unlike the move from 16bit to 32bit where most existing instruction encodings were automatically promoted to using 32bit arguments, the x64 design takes a different approach.
One myth about the x64 instructions is that “everything’s wider.” That’s not the case. In fact many addressing modes which were taken as absolute addresses (actually offsets within a segment, but the segments are 4G in 32bit), are actually now 32bit relative offsets now. There are very few addressing modes which use a full 64bit absolute address. Most addressing modes are 32bit offsets relative to one of the 64bit registers. One interesting addressing mode that is “implied” in many instruction encodings is the notion of RIP-relative addressing. RIP, is the 64bit equivalent of the 32bit EIP, or 16bit IP, or Instruction Pointer. This represents from which address the CPU will fetch the next instruction for execution. Most hard-coded addresses within many instructions are now relative offsets from the current RIP register. This is probably the biggest thing you have to wrap your head around when moving from 32bit assembler.
Even though many instructions will implicitly use the RIP-relative addressing mode, there are some instruction addressing modes that continue to use a 32bit offset, and are not RIP-relative. This can really bite you when doing simple mechanical translations from 32bit to 64bit. These are the SIB form with a 32bit (or even 8bit) offset. What can happen is that you end up forming an address that can only address 32bits, and is thus limited to addressing items below the 4G boundary! And this is a perfectly legal instruction! To demonstration this, consider the following 32bit assembler that we’ll translate to 64bits.
var TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word; asm MOV AX,[EAX * 2 + TestArray] end;
Let’s now translate this for use in 64bit using a simple mechanical translation.
var TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word; asm MOVSX RAX,ECX MOV AX,[RAX * 2 + TestArray] end;
Pretty straight forward, right? Not so fast there partner. Let’s see; I know that I need to use a full 64bit register for the offset but since Integer is still 32bits, I need to “sign-extend” it to 64bits. The venerable MOVSX (Move with sign extension) instruction “promotes” the signed 32bit offset to 64bits while preserving the sign. Nope, that’s not a problem. The only thing I changed in the next instruction was EAX to RAX, so how could that be a problem? Well, when you compile this code you’ll get a rather strange error message:
[DCC Error] Project7.dpr(18): E2577 Assembler instruction requires a 32bit absolute address fixup which is invalid for 64bit
Huh? Remember the little note above about the SIB instruction form? Because the RAX (or EAX in 32bit) register is being scaled (the * 2), this instruction must use the SIB (Scale-Index-Base) instruction form. When using the SIB form RIP isn’t considered when calculating the actual address. Additionally, the offset encoded in the instruction can still only be 8 or 32bits. No 64bit offsets.
In 32bit, the compiler would generate a “fixup” to ensure that the encoding of the instruction offset field to the global “TestArray” variable was properly “fixed up” at runtime should the image happened to be relocated to another address. This is a 32bit absolute address. The 64bit version of this instruction, while actually a truly valid instruction, would only have 32bits in which to place the address of “TestArray.” The “fixup” generated would have to remain 32bit. This could lead to creating an image that were it ever relocated above the 4G boundary, would likely crash at best or read the wrong memory address at worst!
Ok, so now what? There is a SIB form that we can use to work around this problem, but it requires burning another register. The good news is that we now have another 8 registers with which to work. So if you have a rather complicated chunk of 32bit assembler code that burns up all the existing usable 32bit registers, you now have another group of registers that can help solve this problem without having to rework the code even more. So here’s how to fix this for 64 bit:
var TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word; asm MOVSX RAX,ECX LEA R10,[TestArray] MOV AX,[RAX * 2 + R10] end;
Here, I used the volatile R10 register (R8 an R9 are used for parameter passing) to get the absolute address of TestArray using the LEA instruction. While the “address” portion of this instruction is still 32bits, it is taken as RIP-relative. In other words, this value is the “distance” from the next instruction to the variable TestArray in memory. After this instruction, R10 now contains a true 64bit address of the TestArray variable. I must still use the SIB form in the next instruction, but instead of a hard-coded “offset” I use the value in R10. Yes, there is still an implicit offset of 0, which uses the 8bit offset form.
You can see that mindless, mechanical translations of assembler code is likely to cause you some grief due to some of the subtle changes in instruction behaviors. For this very reason, we strongly recommend you use all Object Pascal code instead of resorting to assembler when possible. This will not only better ensure that your code will more likely move unchanged to other processor architectures (think ARM here folks), but you’ll not have to worry about such assembler gotchas in the future. If you’re using assembler code because “it’s faster,” I would encourage you to look closely at the algorithm used. There are many cases where the proper algorithm written in Object Pascal will yield greater gains than a simple translation to assembler using the same algorithm. Yes there are some things which you simply must do in assembler (strange, off-beat calling conventions, “LOCK” instructions for concurrency, etc…), but I would contend that many assembler functions can be moved back to Object Pascal with little impact on performance.Posted by Allen Bauer on October 5th, 2011 under 64bit, Delphi, General | 14 Comments »
So far we’ve had “Testing synchronization primitives” and “Writing a ‘self-monitoring’ thread-pool.” Let’s build on those topics, and discuss what to do with exceptions that occur within a scheduled work item within a thread pool.
My view is that exceptions should be caught and held for later inspection, or re-raised at some synchronization point. What do you think should happen to the exceptions? Should they silently disappear, tear-down the entire application, or should some mechanism be in place to allow the programmer to decide what to do with them?Posted by Allen Bauer on April 7th, 2010 under Delphi, General, Parallel Programming, Work | 10 Comments »
Let’s start thinking about thread pools. How do you manage a general purpose thread pool in the face of no-so-well-written-code? For instance, a task dispatched into the thread pool never returns, effectively locking that thread from ever being recycled. How do you monitor this? How long do you wait before spooling out a new thread? Do you keep a “monitor thread” that periodically checks if a thread has been running longer than some (tunable) value? What are the various techniques for addressing this problem?
So, there you go… Talk amongst yourselves.Posted by Allen Bauer on March 26th, 2010 under CodeGear, Delphi, General, Parallel Programming, Work | 11 Comments »
In this office. I’ve been in the same physical office for nearly 15 years. After years of accumulation, it now looks positively barren. Beginning next Monday, March 29th, 2010, I’ll be in a new building, new location, and new office. The good thing is that the new place is a mere stone’s throw from the current one. It will be great to leave all the Borland ghosts behind.Posted by Allen Bauer on March 26th, 2010 under CodeGear, Delphi, Generics, Personal, Work | 3 Comments »
I’m going to try a completely different approach to this post. I’ll post a question and simply let the discussion ensue. I would even encourage the discussion to spill over to the public newsgroups/forums. Question for today is:
How can you effectively unit-test synchronization primitives for correctness or more generally, how would you test a concurrency library?
Let’s see how far we can get down this rabbit hole ;-).Posted by Allen Bauer on March 22nd, 2010 under CodeGear, Delphi, General, Parallel Programming | 30 Comments »
By now you’re all aware that we’re getting ready to move to a new building here in Scotts Valley. This process is giving us a chance to clean out our offices and during all these archeological expeditions, some lost artifacts are being (re)discovered. Note the following:
These are some bookends that my father made for me within the first year after moving my family to California to work on the Turbo Pascal team. He made these at least two years before Delphi was released, and at a few 6 months before we even began work on it in earnest. Certainly before the codename “Delphi” was ever thought of. I suppose they are my “happy” accident.
This next one is just sad. I received this award at the 2004 Borcon in San Jose from, then Borland President/CEO, Dale Fuller. My title at that time was “Principal Architect”… Of course I like to think that I have strong principles, and maybe that was what they were trying to say… Within a week or so after I got this plaque, another one arrived with the correct spelling of my title. I keep this one just for the sheer hilarity of it. Also, it is a big chunk of heavy marble, so maybe one day I can use to to create a small marble topped table…CodeGear, General, Random, Work | 5 Comments »
Is. This? I simply cannot explain this. At. All.
This was on a bulletin/white-board in the break area. I’d never noticed it because it was covered with photos from various sign-off (final authorization to release the product) celebrations. Lots of photos of both past and present co-workers, many thinner and with more hair ;-). Since we’re in the process of cleaning up in the preparation for moving to our new digs, it is interesting what you find… I presume this image has been on this whiteboard since… I guess… Delphi 5 or is that Delphi S? Either someone has a very odd sense of humor… or, more likely, beer had been involved during one of those sign-off celebrations from the photos. Then again, maybe this whiteboard had been in the Borland board room and this was from a corporate strategy meeting… nah, gotta be the beer.
Ow, my head hurts now…Posted by Allen Bauer on February 19th, 2010 under CodeGear, Delphi, General, Random, Work | 12 Comments »
It seems that my previous post about FreeAndNil sparked a little controversy. Some of you jumped right on board and flat agreed with my assertion. Others took a very defensive approach. Still others, kept an “arms-length” view. Actually, the whole discussion in the comments was very enjoyable to read. There were some very excellent cases on both sides. Whether or not you agreed with my assertion, it was very clear that an example of why I felt the need to make that post was in order.
I wanted to include an example in my first draft of the original post, but I felt that it would come across as too contrived. This time, instead of including some contrived hunk of code that only serves to cloud the issue at hand, I’m going to try a narrative approach and let the reader decide if this is something they need to consider. I may fall flat on my face with this, but I want to try and be as descriptive as I can without the code itself getting in the way. It’s an experiment. Since many of my readers are, presumably, Delphi or C++Builder developers and have some working knowledge of the VCL framework, I will try and present some of the problems and potential solutions in terms of the services that VCL provides.
To start off, the most common case I’ve seen where FreeAndNil can lead to strange behaviors or even memory leaks is when you have a component with a object reference field that is allocated “lazily.” What I mean is that you decide you don’t need burn the memory this object takes up all the time so you leave the field nil and don’t create the instance in the constructor. You rely on the fact that it is nil to know that you need to allocate it. This may seem like the perfect case where you should use FreeAndNil! That is in-fact the very problem. There are cases where you should FreeAndNil in this scenario. The scenario I’m about to describe is not such a case.
If you recall from the previous post, I was specifically referring to using FreeAndNil in the destructor. This is where a very careful dance has to happen. A common scenario in VCL code is to hold references to other component from a given component. Because you are holding a reference there is a built-in mechanism that allows you coordinate the interactions between the components by knowing when a given component is being destroyed. There is the Notification virtual method you can override to know if the component being destroyed is the one to which you have a reference. The general pattern here is to simply nil out your reference.
The problem comes in when you decide that you need to grab some more information out of the object while it is in the throes of destruction. This is where things get dangerous. Just the act of referencing the instance can have dire consequences. Where this can actually cause a memory leak was if the field, property, or method accessed caused the object to lazily allocate that instance I just talked about above. What if the code to destroy that instance was already run in the destructor by the time the Notification method was called? Now you’ve just allocated an instance which has no way to be freed. It’s a leak. It’s also a case where a nil field will never actually cause a crash because you were sooo careful to check for nil and allocate the field if needed. You’ve traded a crash for a memory leak. I’ll let you decide whether or not that is right for your case. My opinion is that leak or crash, it is simply not good design to access an instance that is in the process of being destroyed.
“Oh, I never do that!” That’s probably true, however what about the user’s of your component? Do they understand the internal workings of your component and know that accessing the instance while it is in the throes of destruction is bad? What if it “worked” in v1 of your component and v2 changed some of the internals? Do they even know that the the instance is being destroyed? Luckily, VCL has provided a solution to this by way of the ComponentState. Before the destructor is called that starts the whole destruction process, the virtual method, BeforeDestruction is called which sets the csDestroying flag. This can now be used as a cue for any given component instance whether or not it is being destroyed.
While my post indicting FreeAndNil as not being your friend may have come across as a blanket statement decrying its wanton use, I was clearly not articulating as well as I should that blindly using FreeAndNil without understanding the consequences of its effect on the system as a whole, is likely to bite you. My above example is but one case where you should be very careful about accessing an object in the process of destruction. My point was that using FreeAndNil can sometimes appear to solve the actual problem, when in fact if has merely traded it for another, more insidious, hard to find problem. A problem that doesn’t bite immediately.Posted by Allen Bauer on February 16th, 2010 under CodeGear, Delphi, General, Work, ednfront | 55 Comments »