How does CPU cache work?

Last update: 25th September 2020
Author Isaac
cache, CPU

La cache It hasn't always been in the CPU. In fact, older processors didn't have one. Later, modules similar to today's RAM modules came along, allowing you to optionally add a cache to improve your CPU's performance, similar to how math coprocessors, or FPUs, were added.

In current microprocessors, the cache memory has been integrated within the chip itself, now an almost inseparable part of them, and with several levels to obtain greater benefit from it. In this tutorial, you'll learn more about this type of memory and its importance.

What is the cache?

Before I begin to explain what cache memory is, I would like to summarize very briefly. how a cpu works, so you can better understand its performance. Simply put, the CPU is nothing more than a "calculator" that processes a series of operations with bits of data.

It is the software, the program, that will indicate what calculations the CPU should perform. The program is made up of a series of data and instructions. All of these data and instructions are stored in the secondary memory (hard drive) will be sent to main (primary) memory. From there, they will be captured by the CPU and entered into its internal memory. The instruction will indicate what the CPU should do with the data. For example, it could be an addition instruction. This is how the software is executed…

Briefly put, early computers used a single level of slow memory (some type of ROM or magnetic storage) from which to retrieve these instructions and data. But as CPUs evolved and became faster, major bottlenecks arose due to the slowness of this memory. That's why a fast buffer memory was introduced between the CPU and the secondary memory: RAM (SDRAM).

  How to control fan speed

Despite this, the CPU continued to evolve faster than the speed of RAM itself, creating another bottleneck. So, another, much faster memory was developed, closer to the processor and located between the RAM and the CPU: the cache (SRAM).

Secondary memory is cheap, so you can get large capacities at a good price. In the case of RAM, it's faster, but also more expensive than secondary memory. That's why primary memory capacities aren't as large. If we continue down the ladder, we come across cache memory, which is even more expensive, and therefore has very low capacities. Then there are the registers, also extremely expensive and limited...

With this cache memory, the CPU cores can be fed much faster, so that the latencies and bandwidth The RAM works with so that it doesn't affect CPU performance as much. It's a way to provide those data and instructions much more locally and quickly... In other words, so that they're more "at hand."

Nowadays, the improvement of secondary memory has meant a great leap. I am referring to the new SSD or solid-state hard drives. However, they are still slower than RAM, so these other levels are still necessary.

Cache levels in a modern processor

Intel Pentium III (Tualatin) dieshot in which I have marked the unified L2 cache in green and the data and instruction L1 cache in purple.

In a modern processor, there is not just one level of cache, but it is divided into several levels. Typically, there are between 2 and 4. levels or levels (L):

  • LLC (Last Level Cache): This is the last level of cache memory, that is, the one "closest" to RAM, the one with the highest number within the processor. It can be L4, L3, or L2, depending on the level. For example, in current Intel and AMD processors, it's L3. This memory can reach several megabytes and is unified, meaning it stores both data and instructions. Generally, this type of memory is shared by all cores, if there are several. If, for example, there are L3 and L2, L2 could be dedicated to just one core or shared by two cores, and L3 would feed all of them.
  • L1 cache: This cache is faster than the previous one and is located even closer to the control unit so it can retrieve information and send it to the execution units more quickly. Unlike the higher levels, the L1 has a lower capacity, which is normal, since the lower the levels, the lower the capacity. But the most notable difference is that in many processors, it is not unified like the other levels. In this case, it is separated into L1I and L1D, that is, only for instructions and only for data.
  PCL vs. PostScript Drivers: Real Differences and When to Use Them

Why Is This Important?

CPU Colors

Well, at this point you will already have an impression of the reason why it improves the performanceA practical example will help you understand this very well. Imagine that you (the CPU) have to go and get the tools (instructions and data) you need to do a job (a program).

It is not the same to have the tools in the store (secondary memory), than having them in the garage (RAM), or having them right at your fingertips (cache). If you have to go to the store or the garage, it will take much longer than if you can reach out and grab them right next to you. This isn't always the case. When you first need an instruction or data, you'll have to go to the room or the garage to get them. But once you've grabbed them and have them right next to you, you'll be much more efficient the next time you need them.

The CPU, through a search and bug system It will always search for instructions and data first in L1. If there's a failure, it will search in L2, and if that's not there either, it will go to L3 (if there is one). And if there's also a failure, it will have no choice but to search in RAM, taking more clock cycles. But if it succeeds, access will be much faster. Remember that it takes fewer cycles to access L1 than L2, and L3 in turn takes fewer cycles to access than LXNUMX, and so on.

That's the goal of the cache, to reduce memory access latency. I insist, I'm summarizing a lot how it works. But basically, this is how it makes your apps and games run much faster.

  How to optimize the integrated GPU for maximum performance