8  Complexity & Performance


Goals


Computational complexity is the formal study of how much time and memory a program needs as input size increases.

Most algorithms require more time and memory to process more data, so an algorithm that is instantaneous for a dozen items may become impossibly slow with a million inputs.

When we discuss complexity we focus on the relative growth. This means we’ll speak in abstract terms, not about how many milliseconds or megabytes a program took to run.

Those are the domain of the real-world performance.

We’ll take a look at both, and the relationship between the two.

Measuring Complexity

When measuring complexity, we are typically concerned with how many times we perform an action, relative to how many inputs were processed.

We will use a notation known as “Big O notation” to describe the growth of the function in terms of the number of inputs.

A function that is O(N) grows in proportion to the number of inputs it takes, while O(N2) means that its complexity grows as the square of its inputs.

The definition of action and inputs is deliberately vague, and can accommodate any algorithm.

Example 1: O(N)

If Python did not have a len function, we’d write one like this:

def my_len(s: str) -> int:
    count = 0
    for ch in s:
        count += 1
    return count

The longer the string is, the longer this function would take. At a typical string length it would not be noticeable, but if you passed in a string with a billion characters, you’d certainly notice the wait.

For this function, the action can be thought of as “visiting a character & incrementing the count by one”.

The size of the inputs then would be the total number of characters in the string. That value is the number that determines the run time of the program.

For a string of length N we would need to perform this action N times.

We can call this O(N), where N is the size of the inputs.

Example 2: O(Nm)

Let’s consider a second case.

def count_chars(s: str, to_count: tuple[str]) -> int:
    """ count occurrences characters in to_count within s """
    count = 0
    for ch in s:
       if ch in to_count:
            count += 1
    return count

This is similar to our above example, but does more work.

If we were counting how many times ch in to_count is called, we’d see that it is now the product of len(s) * len(to_count).

So a call like count_chars(test_string, ("a", "b", "c")) would take approximately three times as long as my_len(s).

We could think of this as O(Nm), where m is now the length of to_count.

But if in practice, N is likely much much larger than m, we would still consider this algorithm O(N).

That is because we are focused on the relative growth of the complexity relative to the inputs, which is governed by the much larger variable.

Example 3: O(CN)

The example above where we dropped m demonstrated that we’re focused on growth over absolute measures.

This means we never consider constants:

def f(data)
    for x in data:
        g(x)
        g(x)
        g(x)
        g(x)

In this abstract form, we perform the action g four times for every input to f.

You might be tempted to say it is O(4N), but the constant 4 is not a factor in the growth, this function is O(N).

Complexity Classes

We can group functions into several complexity classes based on the nature of their growth in relation to inputs.

Linear Time

It is very common for an algorithm to scale linearly with its inputs. One more input means one more action, or at least a proportional number more actions. (Like 4 in the example above.)

This complexity class is O(N), or linear time.

Constant Time

Some algorithms are unaffected by the length of their inputs, always performing the same amount of work.

Capitalizing a string does not require looking at any characters beyond the first, so capitalizing a five-character string would take the same amount of time as capitalizing a string containing billions of characters.

Another, more practical example is the lookup of an item in a dict. As an implementation of a hash table, lookups will always be O(1), regardless of how many items are in the dict.

Quadratic Time

The next most common complexity class is O(N2), let’s consider an algorithm to find duplicates in a data set:

def find_duplicates(data):
    for idxA, itemA in enumerate(data):
        for idxB, itemB in enumerate(data):
            if idxA != idxB and itemA == itemB:
                handle_duplicate(itemA)

This code needs to look at every item in the data set and compare it to every other item in the data set, the innermost code runs N2 times.

As a variant on this, we could have two data sets of similar sizes N and M:

def find_duplicates(set1, set2):
    for itemA in set1:
        for itemB in set2:
            if itemA == itemB:

This code would be O(NM), but if the data sets were similar enough in size it is often thought of as O(N2).

Other Common Complexity Classes

Complexity Class Name Examples
O(1) Constant array access, hash table lookup (average)
O(logN) Logarithmic binary search
O(N) Linear unsorted array search/scan
O(N2) Quadratic nested loops
O(N3) Cubic matrix multiplication
O(2N) Exponential recursive Fibonacci
O(N!) Factorial generating all permutations

This notation is known as Big O notation, classifying a functions growth as the number of inputs N grows larger.

via https://articles.wesionary.team/the-big-oh-o-a-beginners-guide-a4c48af39bfa

Memory vs. Speed

It is often possible to make code faster by exchanging speed for memory, or to use less memory at the cost of performance.

A straightforward example of this is a simple caching algorithm:

CACHE = {}

def cached_expensive(x, y):
  # this will call expensive once and only once per (x, y)
  if (x, y) not in CACHE:
    CACHE[(x, y)] = expensive(x, y)

  return CACHE[(x, y)]

This general approach means any duplicate calls will be instantaneous, but at the cost of storing potentially a large amount of extra data.

Other more nuanced approaches can lead to a balance between performance and memory usage. The “right” approach depend upon the needs of the application.

What makes a program slow?

A program performing more actions will mean that it takes more time, but not all actions are created equally.

This is a visualization of a popular post Latency Numbers Every Programmer Should Know:

The important thing to note is that operations operations on data already in memory are the fastest, around 10ns, and the “further away” data gets from the CPU the longer things take.

Location Operation Latency In Seconds
CPU L1 Cache Read/Write ~10 ns 0.00000001 s
RAM Read/Write ~100 ns 0.0000001 s
SSD Read/Write 1MB ~100 μs 0.0001 s
HDD Write 1MB ~1 ms 0.001 s
Internet - Same City Read/Write 1MB - Same City ~10 ms 0.02 s
Internet - Long Distance Read/Write 1MB ~200 ms 0.2 s

This is why we typically exclude I/O operations from performance testing, unless that is our focus.

The numbers related to disk access and network I/O are thousands of times higher than things happening in the processor, and will almost always be the bottleneck in data-intensive applications.

Measuring Performance

While it generally important to focus on reducing unnecessary algorithmic complexity, other bottlenecks may exist. In practice, we often want to measure the real-world performance of a piece of code.

If you have some code that is running slower than expected, you might decide you need to measure how long a particular piece of code takes. This would help you confirm which portion of the code is slow & provide a baseline to measure improvements against.

Let’s define two functions that do the same thing, to compare how they perform:

def str_add():
    lots_of_strings = ["abc"] * 1000
    combined = ""
    for s in lots_of_strings:
        combined += s
    return combined

def str_join():
    lots_of_strings = ["abc"] * 1000
    return "".join(lots_of_strings)
import time

start = time.time()
str_add()
elapsed = time.time() - start

print(f"str_add took {elapsed:-.6f} seconds")


start = time.time()
str_join()
elapsed = time.time() - start

print(f"str_join took {elapsed:-.6f} seconds")
str_add took 0.000065 seconds
str_join took 0.000022 seconds

But if we time it again, we might get different results:

for n in range(10):
    start = time.time()
    str_add()
    elapsed = time.time() - start
  
    print(f"str_add run {n} took {elapsed:-.6f} seconds")
  
    # short pause between runs (shows process variation)
    time.sleep(0.1)
str_add run 0 took 0.000041 seconds
str_add run 1 took 0.000131 seconds
str_add run 2 took 0.000158 seconds
str_add run 3 took 0.000324 seconds
str_add run 4 took 0.000256 seconds
str_add run 5 took 0.000324 seconds
str_add run 6 took 0.000269 seconds
str_add run 7 took 0.000185 seconds
str_add run 8 took 0.000259 seconds
str_add run 9 took 0.000314 seconds
for n in range(10):
    start = time.time()
    str_join()
    elapsed = time.time() - start
  
    print(f"str_join run {n} took {elapsed:-.6f} seconds")
  
    # short pause between runs (shows process variation)
    time.sleep(0.1)
str_join run 0 took 0.000030 seconds
str_join run 1 took 0.000075 seconds
str_join run 2 took 0.000054 seconds
str_join run 3 took 0.000053 seconds
str_join run 4 took 0.000042 seconds
str_join run 5 took 0.000060 seconds
str_join run 6 took 0.000047 seconds
str_join run 7 took 0.000128 seconds
str_join run 8 took 0.000048 seconds
str_join run 9 took 0.000090 seconds

While your code is running, your computer may be doing lots of other things in the background. This produces significant noise in our attempt to measure very short functions.

The noise here is significant, some runs of the same function take 10x longer than others.

To get an accurate sample, we can run this function thousands of times. The built-in timeit module is specifically designed for this:

import timeit

timeit.timeit("str_add()", number=100000, globals={"str_add": str_add})
3.979762999981176
timeit.timeit("str_join()", number=100000, globals={"str_join": str_join})
0.7548895420040935

The signature of the timeit function is:

timeit.timeit(
  stmt='pass',
  setup='pass',
  timer=<default timer>,
  number=1000000,
  globals=None
)
  • stmt is a string that represents Python code to be tested, the parameter must be a string so Python can delay execution of your function until it is time to test it.
  • setup is also a string of code, but it will only be run once, and prior to the iteration of stmt.
  • number controls the number of iterations, higher numbers will lead to more accurate measurement.
  • globals allows you to pass in outside values that will be used in the statement. The form is a dictionary mapping names to values.

timeit is primarily for testing code that runs quickly where thousands or millions of runs are likely.

Remember: If you have slow functions that are network or I/O heavy, those should typically be tested in smaller pieces since the networking & I/O portions will overwhelm the portions within your control.

Further Exploration