- Core relationship: Time = (NI × CPI)/f and Performance = (f × CPI)/NI, with CPI/CPI as key levers.
- True scalability: Amdahl and Gustafson limit speed-up; efficiency decreases as overhead increases.
- Memory rules: hit/miss, DDR and bandwidth affect CPI as much as frequency.
- WPA method: critical path, threads (Ready/Running/Waiting), DPC/ISR and priorities explain bottlenecks.
When you are looking for all the formulas to calculate the performance of a CPU, it is ideal to have them well explained, with context and practical cases., not just a loose list of equations. This guide brings together and rewrites in a clear and comprehensive way the metrics, formulas, nuances and techniques of professional analysis. (including the use of Windows Performance Analyzer) that often appear scattered across many sources.
Here you will find everything from the classic units (IPS, IPC, CPI and FLOPS) to the exact relationship between execution time and performance, Amdahl and Gustafson's laws, memory and bandwidth, and even how to study thread interference and DPC/ISR with WPA.Additionally, it includes a CPU power consumption (C·V²·F) calculation, tools for measuring it, and recommendations for practical efficiency and performance improvements.
Basic units and metrics: IPS, IPC, CPI, FLOPS, and frequency

The first thing is to take into account which are the most essential units that we are going to handle:
- IPS (Instructions per second) measures how many instructions the processor executes in one second (usually MIPS, millions of IPS). It is a useful metric to have a global idea of the throughput, although it does not capture instruction complexity or microarchitectural differences well. Historical and modern examples show the gap between designs and eras, and with overclocking it can vary.
- IPC (Instructions per cycle) Indicates how many instructions the CPU executes on average per clock cycle. It is key to understand the efficiency per cycle regardless of the frequency. Comparing IPC requires using the same program or benchmark on different machines, because the count and type of instructions depend on the software.
- CPI (Cycles per instruction) It is the conceptual inverse of the CPI: how many cycles each instruction takes on average. The CPI varies depending on the instruction type and microarchitecture. (for example, a load may require more cycles than a jump), so it is usually calculated as weighted average by instruction classes.
- FLOPS (Floating Point Operations Per Second) quantifies floating-point computation, critical in HPC, AI, and science. A distinction is made between single precision (SP) and double precision (DP) and energy efficiency is also referred to as FLOPS/W.. It is important to differentiate between native FLOPS and normalized FLOPS. when comparing heterogeneous platforms.
- Frequency (Hz) marks the rhythm of the clock, but is not directly synonymous with performance. The myth of the MHz: Today a lower frequency CPU can outperform a faster one by parallelism, better IPC and more efficient microarchitectures. In addition, the pipeline depth and critical logic determine the achievable frequency..
Essential formulas: execution time, throughput, IPC, CPI, IPS, and FLOPS
Some Essential formulas for calculating/measuring performance of a processor that you should know are:
- Execution time: A standard way of expressing this is Time = NI × CPI × TWhere NI is the number of instructions in the program, CEP the average number of cycles per instruction and T the clock period (T = 1/frequency). Equivalent: Time = (NI × CPI) / Frequency. Hardware and compiler often attack CPI and frequency; NI depends on software..
- Unlimited is the inverse of time: Performance = 1 / Time. Rewriting, Performance = (Frequency × CPI) / NI. This makes clear the triangle of commitments: increase frequency and CPI and/or lower NI (better algorithm, better compilation) increases performance.
- CPU time on multi-processor systems It is expressed by adding thread times or using aggregations that contemplate P processors. In parallel, the actually parallelizable portion and the coordination overhead limit the benefit. (see Amdahl's and Gustafson's Laws below).
- Effective CPI for a specific program it is obtained from actual average number of instructions per cycle observed during its execution; for comparisons, uses the same benchmark on both machines so that NI and instruction mixing are comparable.
- Weighted average CPI It is usually calculated as Σ (CPI_i × weight_i), where each CPI_i corresponds to a class of instruction and weight_i is the fraction of that class in the program. This class-based view allows you to see where to optimize (e.g., slow loads or expensive splits).
- IPS (Instructions per second) is often approximated as IPS ≈ Frequency × CPI. Be careful with pipelines, dependencies, prediction and channel emptying: in practice, Bursts and penalties can take you away from the theoretical figure.
- FLOPS In a simple system it is estimated as Frequency × floating operations per cycle (depending on the vector width and FPU units), and in parallel as Total FLOPS ≈ Σ FLOPS of each processor. Difference if you work in SP or DP and remember the distinction between Native and normalized FLOPS.
Scalability: Amdahl's Law, Gustafson's Law, speed-up, efficiency and isoefficiency
Other important formulas for calculating computer performance, efficiency, etc.:
- Amdahl's Law models the gain from speeding up a part of the system. If a fraction f of the time does not benefit from the improvement, the maximum speed-up is bounded by 1/f. In parallel, with parallelizable fraction p, the typical limit is expressed as S(N) = 1 / ((1 − p) + p/N). Improving the bottleneck (reducing the effective sequential part) is what pays the most.
- Application to the pipeline: Pipelining reduces latencies per instruction in steady state, but Bubbling, data risks, and prediction failures They add penalties that limit the ideal speed-up. Deepening the pipeline increases frequency but also penalties for emptying..
- Gustafson's Law takes a different view: as the problem grows with the number of processors, S(N) ≈ N − α (N − 1), where α approximates the sequential fraction by scaling the load. He emphasizes that load distribution and overhead determine real efficiency..
- Efficiency is defined as E = S(N) / N. As N increases, E tends to decrease by coordination, shared memory and imbalances. Isoefficiency look for how increase the size of the problem n all with keep E constant as p increases (processors), absorbing the overhead.
Memory, caches, bandwidth and storage: the other 50% of performance

In addition to calculations for processing, memory performance is also important, the most important formulas for which are:
- The memory hierarchy determines the CPI: A cache access may cost 1 cycle, while RAM access hundreds of cycles. Hit/fail rates matter as much as, or more than, raw bandwidth and latency.. Better hit-rate equals fewer penalties and less energy spent going to memory.
- Key definitions: Miss rate = number of failures / total number of accesses y Hit rate = number of hits / total number of accesses. Increase the size of the instruction or data cache and improve the locality of your code increase hit-rate and reduce CPI.
- DDR and effective frequency: DDR memories perform 2 transfers per cycle of the controller, that's why DDR4-3200 is equivalent to 1600 MHz of memclk. Theoretical bandwidth by module is approximated as memclk × 2 × bus_width (bits) × number of channels, and is expressed in bytes/s (divide by 8). Classic example of DDR4-3200, 64-bit bus, Dual Channel: 1.600.000.000 × 2 × 64 × 2 = 409.600.000.000 bits/s ≈ 51,2 GB / s.
- Rotational latency in HDD (when the head is already on the track): it is estimated as 0,5 rotation / (RPM/60). For 7200 RPM: 0,5 / (7200/60) ≈ 4,16 ms. Disk buffers and caches can cushion some of the access time, but they do not eliminate the mechanical nature of the delay.
- Memory and computing demand: In HPC loads the analysis is made of operating intensity (FLOP/byte), relating Floating point instructions and data movement. A low intensity betrays memory limitation; a high one, computational limitation. Optimize layouts and sequential access can completely change the performance profile.
Consumption and efficiency: TDP, dynamic power and tools
On the other hand, we also have the issues of consumption and efficiency:
- TDP is not actual consumption: is thermal/design objective. Consumption varies with effective load, voltage and frequency. Under light loads, The actual average consumption is usually much lower than the TDP.
- Approximate dynamic power: P = C · V² · F. C is the switched capacitance, V the voltage and F the frequency. Increasing voltage penalizes quadratically; hence overclocking with overvolt causes large jumps in consumption and heat. In addition to the dynamic part, there are leaks that grow with temperature and process..