How to use DSP caches, part 1

oskycar 2010-09-11

展开全文

Design Article

How to use DSP caches, part 1

Staff, Texas Instruments

5/21/2007 3:00 AM EDT

[Part 2 explains how to configure caches and how to use them correctly, all with a focus on maintaining cache coherence. It shows how DMA transfers affect cached memory, and how to manage DMA transfers with double buffering.]

A processor cache is an area of high-speed memory that stores information near the processor. This helps make the processing of common instructions efficient and therefore speeds up computation time.

In this paper, cache-based memory system will be compared to flat memory systems. The importance of cache in high-speed processor architectures will be explained and this article will also provide a brief introduction to the fundamental concepts of cache and cache terminology. Using Texas Instrument's TMS320C64x DSP architecture, a detailed explanation will teach developers how cache works, how to configure and how to correctly use cache. Throughout the paper the focus will be on cache coherence.

Memory Organization
In Figure 1, the left model shows a flat memory system architecture, and both the CPU and internal memory run at 300 MHz. A memory access penalty will only occur when the CPU accesses external memory. Memory stalls will not occur for internal memory accesses.

If the CPU clock is increased to 600 MHz, wait-states will occur unless the memory speed is also increased to 600 MHz. Unfortunately, a same size internal memory running at 600 MHz would be far too expensive for most applications. Leaving it at 300 MHz is also not an option, since this would effectively reduce the CPU clock. Suppose an algorithm attempted to access memory every cycle. Then each memory access would suffer a one-cycle stall, effectively doubling the cycle count and canceling out the doubling in clock speed.

Figure 1. Flat vs. Hierarchical Memory Architecture.

The solution is to use a memory hierarchy with a fast but small memory close to the CPU that can be accessed without stalls. The memories become larger but slower the further away from the CPU. The memory levels closest to the CPU typically act as a cache for the lower level memories.

Principles of Locality
Of course, this solution only works if the CPU can use the memories closest to it for most accesses. Fortunately this is generally the case thanks to the principle of locality, which implies that a program will access a relatively small portion of overall address space during a given window of time. There are two basic types of locality:

Spatial locality - the concept that a resource is more likely to be accessed if a resource near it was just accessed.
Temporal locality - the concept that a resource accessed at one point in time will be accessed again in the near future.

Spatial locality occurs due to the manner in which computer programs are created: Generally, related data are stored in consecutive locations in memory. For example, one common pattern in computing is to process the first item in an array, then the second item, and so forth. Similarly, temporal locality occurs because programs contain structures such as loops that access the same instructions (and often the same data) over and over again.