Sunday, November 18, 2012

Direct Memory Access

We're currently profiling a Java application that features Lucene. There's some debate as to what is going on. Is our app performing badly because it is IO-bound? This would make sense as we are writing a lot of indexes to disk. So, why is the CPU usage so high?

Would a lot of IO trash the CPU? Kernel writer, Robert Love, suggests not:

"Given that the processors can be orders of magnitude faster than the hardware they talk to, it is not ideal for the kernel to issue a request and wait for a response from the significantly slower hardware. Instead, because the hardware is comparatively slow to respond, the kernel must be free to go and handle other work, dealing with the hardware only after that hardware has actually completed its work." [1]

In most (non-embedded) architectures, it appears that the CPU has very little to do with the heavy-lifting of data. What goes on is this (with a lot explanation stolen from Ulrich Drepper's What Every Programmer Should Know About Memory [2]):

The standardized chipset looks like this:


Let's define some terms:

  • "All CPUs (two in the previous example, but there can be more) are connected via a common bus (the Front Side Bus, FSB) to the Northbridge. 
  • "The Northbridge contains, among other things, the memory controller.
  • "The Southbridge often referred to as the I/O bridge, handles communication with devices through a variety of different buses".

The consequences are:

  • "All data communication from one CPU to another must travel over the same bus used to communicate with the Northbridge.
  • "All communication with RAM must pass through the Northbridge. 
  • "Communication between a CPU and a device attached to the Southbridge is routed through the Northbridge."


This is where direct memory access (DMA) comes in: 

"DMA allows devices, with the help of the Northbridge, to store and receive data in RAM directly without the intervention of the CPU (and its inherent performance cost)." [2]


So, all our IO seems unlikely to be the cause of our CPU problems (caveat: we need to do more testing).

Out of interest, I was reading up on DMA and stumbled on this from the classic Operating Systems [3]:

  1. "The CPU programs the DMA controller by setting its registers so it knows what to transfer where. It also issues a command to the disk controller telling it to read date from the disk into its internal buffer and verify the checksum. When valid data are in the disk controller's buffer, DMA can begin."
  2. "The DMA controller initiates the transfer by issuing a read request over the bus to the disk controller."
  3. "The disk controller fetches the next word from its internal buffer... The write to memory is another standard bus cycle."
  4. "When the write is complete, the disk controller sends an acknowledgement to the [DMA controller], also over the bus. The DMA controller then increments the memory address to use and decrements the byte count."

"If the byte count is still greater than 0, steps 2 through 4 are repeated until the count reaches 0. At this point the controller causes an interrupt. When the operating system starts up, it does not have to copy the block to memory; it is already there".

[1] Linux Kernel Development - Rob Love
[2] What Every Programmer Should Know About Memory - Ulrich Drepper.
[3] Operating Systems - Tanenbaum & Woodhull

No comments:

Post a Comment