Delphi Compiler Core

The Hidden Costs of Memory Mapped Files

Are memory mapped files (MMFs) always faster than normal file I/O techniques?

Not necessarily.

Memory mapped files are seductive because they offer the lure of reading and writing data on disk using only a memory pointer. Advance the pointer to a new address, and presto! the data magically appears there. The system takes care of reading the data from disk on demand, using the memory page protection architecture of the x386 virtual memory controller. If you refer to an address that has not yet been loaded into RAM, a page fault occurs behind the scenes and reads the data into RAM for you. Your program doesn’t notice this activity because your thread is suspended while the page fault is processed.

MMFs give you simple access to data on disk without all the source code overhead of file I/O and buffering. It’s simple and it’s fast. So it must be better than the old way of doing things, right? Not necessarily.

Memory mapped files are not always faster than custom data loading
algorithms. You have no control over how much of the MMF is kept in memory
or for how long. This means that using an MMF may push other things out of
RAM, such as code or data pages that you will need back "soon".

Also, page faults are not free. A page fault can take a lot longer for the system to process than a simple file I/O call. The additional system overhead of using page faults is hidden by the fact that fault processing is performed in the system kernel on a different thread, not in your process.

A custom data management routine can be made to take into account data
access patterns specific to your data, locality of reference, and cache
longevity. A carefully crafted data manager for a specific data set can
provide comparable performance to raw MMF but use significantly less
physical RAM to do it, thus reducing page swapping and improving overall
system performance.

We’ve studied whether MMFs would provide any benefit to the Delphi compiler,
for example, above and beyond the simple file buffering scheme that has
served us well for reading source files for the past decade. What we found
was that reading source code from an MMF was very fast, yes, but the MMF
sucked up lots and lots of RAM and pushed more important stuff (like
compiler symbols) out to disk.

We used the sequential access hint in hopes
of convincing the MMF to retire old pages sooner. The sequential access
hint made the MMF load pages before our source pointer touched them, which
improved scanning speed (because the program didn’t have to wait for the page to be loaded on demand). However, that did not reduce the memory footprint of the
MMF overall, and the large memory footprint decreased overall compile performance.

The difference between using an MMF for source scanning and using a simple
file buffer technique boils down to this: The MMF consumed a minimum of 64k
of RAM, and kept approximately 25% of the file in memory after it had been
scanned.

The simple file buffer technique never used more than 4k of RAM
regardless of the source file size, and kept 0% of the file in memory after
it had been scanned. For a 600k source file like Windows.pas, the MMF would
chew up about 100K of RAM, whereas the simple file buffer would use no more
than 4-8k of RAM.

When compiling a large project, the 96k difference between MMF and file
buffer meant that potentially 96k of compiler symbols would be pushed out to
disk by the MMF, and have to be loaded again from disk later.

I think part of the reason MMF doesn’t pay off in this example is that the
source file is not the most important data in memory for a compiler. The
symbol table is the most important data, and tends to be quite a bit larger
than the source. (To compile and link a simple VCL app, the compiler must sift through more than 10MB of symbol data.)

In cases where the data in the MMF is the most important
data to the application, MMF may provide greater advantages.

Memory Mapped File Advantages:

  • Simple and easy access to data as if it were already in memory
  • Excellent performance in most situations, compared to generic file I/O.
  • Implicitly asynchronous file I/O, without threading headaches.

Memory Mapped File Disadvantages:

  • Larger memory footprint than traditional file I/O
  • No control over how much memory is used or how long it stays in RAM
  • MMFs require a fixed file size. Expanding an MMF is a pain in the neck.
  • Byte for byte mapping from disk to memory makes compression of file data or file format versioning more difficult.
  • MMFs do not support file sharing.
  • Contiguous block of address space required. It’s possible to fragment your process address space to the point that you can’t map a 500MB file into memory in one chunk.
  • Not readily available to typesafe managed .NET code.

There are still opportunities to tweak the compiler’s I/O performance. I
hope to experiment with I/O completion ports and asynchronous file I/O to
preload the next source buffer independently of the scanner, so that the
compiler doesn’t have to wait for file reads to complete. But that’s not
critical to the product, so it will have to wait for the next rainy day
weekend. (or trans-oceanic flight)

Posted by Danny Thorpe archive on March 19th, 2004 under Uncategorized |



One Response to “The Hidden Costs of Memory Mapped Files”

  1. Mandar Says:

    Hi,

    We have found memory mapped I/O to be very fast in our application which is used to decode images from MPEG clips and render them on screen.

    What we do is we map only that part of memory which we want to seek using MapViewOfFile (Usually not more than 100k),

    hence most of the page faults are avoided.

    mandarn@gmail.com



Server Response from: blog1.codegear.com