Big-O gives us a tool to compare algorithms’ efficiency and execution times without having to write the code and do the experiments. However, I feel that many people misuse the analysis unknowingly. Don’t get me wrong. I am also a big fan of analytic models and understand that they provide insights that cannot be found empirically. However, in case of Big-O, I feel that it has underlying assumptions that were true when it was formed but have since become false. Thus, we need a revision of how Big-O should be taught. This post explains why…
At its soul, Big-O assumes that “more work” means longer execution. This assumption was valid up until early 1990s when computer did most of the work sequentially: computers executed instructions one at a time and if a particular operation took longer to execute (because it had to access DRAM or disk), the computer stalled and waited. Thus, the contributions to execution time from computes, memory, I/O were additive:
Total execution time = Time in compute + Time in memory + Time in I/O
However, in the last two decades, Moore’s law has allowed us to build machines that are far more complex. The sole focus of computer scientists has been to increase “latency hiding,” i.e., they have designed machines that minimize stalls for long latency operations.For example, out-of-order cores overlap computes with DRAM accesses, thus, reducing the impact of DRAM misses. Similarly, OSes overlap processing of other threads with I/O, thus, reducing the effective cost of I/O. In this context, quantifying the impact of a particular factor on overall performance has become next to impossible and combining contributions using addition is incorrect. Unfortunately, I still see professors teaching Big-O and additive performance analysis equations to CS graduates without explaining the dangers of relying on it.
The reason I started this article: A recent analytic model I built for a multi-core chip looked something like (details removed):
Total execution time = MAX(execution time of factor 1, execution time of factor 2, )
It actually actually correlated with my simulations surprisingly well while the Big-O of the same algorithm was way off.