It is known that a typical program spends 90% of its time using 10% of the available memory. This, combined with the fact that lower capacity memories can be faster for the same price (compared with higher capacity memories), highlights a possible optimisation.

A small, but very fast, cache sits between the processor and memory. Its purpose, as the name suggests, is to cache data from memory that is likely to be accessed in the near future. There are two primary methods for determining which memory locations are likely to be accessed soon:

  • Spacial locality - it is assumed that if a given address is used, other addresses close to it will be used as well.
  • Temporal locality - it is assumed that if a given address is used, the same address will be used again soon.

In order to fetch data from a given memory location, the cache is first checked, and if it exists in the cache then a memory access can be avoided.

Cache Hierarchy

This system can be extended, using progressively slower levels of cache with higher capacities. The fastest cache, known as L1, resides on the processor. A second level (L2) cache is common, and is not necessarily integrated with the silicon chip. This does not have to stop here, and another level (L3) is becoming increasingly common in modern high-performance systems.

Hit Rate

The proportion of memory accesses which are served from the cache is known as the hit rate. Increasing this proportion reduces the average memory access time, thus improving the overall performance of the system.