def str_add():
= ["abc"] * 1000
lots_of_strings = ""
combined for s in lots_of_strings:
+= s
combined return combined
def str_join():
= ["abc"] * 1000
lots_of_strings return "".join(lots_of_strings)
8 Complexity & Performance
Goals
- Introduce computational complexity.
- Understand how to measure performance & identify bottlenecks.
Computational complexity is the formal study of how much time and memory a program needs as input size increases.
Most algorithms require more time and memory to process more data, so an algorithm that is instantaneous for a dozen items may become impossibly slow with a million inputs.
When we discuss complexity we focus on the relative growth. This means we’ll speak in abstract terms, not about how many milliseconds or megabytes a program took to run.
Those are the domain of the real-world performance.
We’ll take a look at both, and the relationship between the two.
Measuring Complexity
When measuring complexity, we are typically concerned with how many times we perform an action, relative to how many inputs were processed.
We will use a notation known as “Big O notation” to describe the growth of the function in terms of the number of inputs.
A function that is
The definition of action and inputs is deliberately vague, and can accommodate any algorithm.
Example 1:
If Python did not have a len
function, we’d write one like this:
def my_len(s: str) -> int:
= 0
count for ch in s:
+= 1
count return count
The longer the string is, the longer this function would take. At a typical string length it would not be noticeable, but if you passed in a string with a billion characters, you’d certainly notice the wait.
For this function, the action can be thought of as “visiting a character & incrementing the count by one”.
The size of the inputs then would be the total number of characters in the string. That value is the number that determines the run time of the program.
For a string of length N we would need to perform this action N times.
We can call this
Example 2:
Let’s consider a second case.
def count_chars(s: str, to_count: tuple[str]) -> int:
""" count occurrences characters in to_count within s """
= 0
count for ch in s:
if ch in to_count:
+= 1
count return count
This is similar to our above example, but does more work.
If we were counting how many times ch in to_count
is called, we’d see that it is now the product of len(s) * len(to_count)
.
So a call like count_chars(test_string, ("a", "b", "c"))
would take approximately three times as long as my_len(s)
.
We could think of this as to_count
.
But if in practice,
That is because we are focused on the relative growth of the complexity relative to the inputs, which is governed by the much larger variable.
Example 3:
The example above where we dropped
This means we never consider constants:
def f(data)
for x in data:
g(x)
g(x)
g(x) g(x)
In this abstract form, we perform the action g
four times for every input to f
.
You might be tempted to say it is
Complexity Classes
We can group functions into several complexity classes based on the nature of their growth in relation to inputs.
Linear Time
It is very common for an algorithm to scale linearly with its inputs. One more input means one more action, or at least a proportional number more actions. (Like 4 in the example above.)
This complexity class is
Constant Time
Some algorithms are unaffected by the length of their inputs, always performing the same amount of work.
Capitalizing a string does not require looking at any characters beyond the first, so capitalizing a five-character string would take the same amount of time as capitalizing a string containing billions of characters.
Another, more practical example is the lookup of an item in a dict
. As an implementation of a hash table, lookups will always be dict
.
Quadratic Time
The next most common complexity class is
def find_duplicates(data):
for idxA, itemA in enumerate(data):
for idxB, itemB in enumerate(data):
if idxA != idxB and itemA == itemB:
handle_duplicate(itemA)
This code needs to look at every item in the data set and compare it to every other item in the data set, the innermost code runs
As a variant on this, we could have two data sets of similar sizes
def find_duplicates(set1, set2):
for itemA in set1:
for itemB in set2:
if itemA == itemB:
This code would be
Other Common Complexity Classes
Complexity Class | Name | Examples |
---|---|---|
Constant | array access, hash table lookup (average) | |
Logarithmic | binary search | |
Linear | unsorted array search/scan | |
Quadratic | nested loops | |
Cubic | matrix multiplication | |
Exponential | recursive Fibonacci | |
Factorial | generating all permutations |
This notation is known as Big O notation, classifying a functions growth as the number of inputs
Memory vs. Speed
It is often possible to make code faster by exchanging speed for memory, or to use less memory at the cost of performance.
A straightforward example of this is a simple caching algorithm:
= {}
CACHE
def cached_expensive(x, y):
# this will call expensive once and only once per (x, y)
if (x, y) not in CACHE:
= expensive(x, y)
CACHE[(x, y)]
return CACHE[(x, y)]
This general approach means any duplicate calls will be instantaneous, but at the cost of storing potentially a large amount of extra data.
Other more nuanced approaches can lead to a balance between performance and memory usage. The “right” approach depend upon the needs of the application.
What makes a program slow?
A program performing more actions will mean that it takes more time, but not all actions are created equally.
This is a visualization of a popular post Latency Numbers Every Programmer Should Know:
The important thing to note is that operations operations on data already in memory are the fastest, around 10ns, and the “further away” data gets from the CPU the longer things take.
Location | Operation | Latency | In Seconds |
---|---|---|---|
CPU L1 Cache | Read/Write | ~10 ns | 0.00000001 s |
RAM | Read/Write | ~100 ns | 0.0000001 s |
SSD | Read/Write 1MB | ~100 μs | 0.0001 s |
HDD | Write 1MB | ~1 ms | 0.001 s |
Internet - Same City | Read/Write 1MB - Same City | ~10 ms | 0.02 s |
Internet - Long Distance | Read/Write 1MB | ~200 ms | 0.2 s |
This is why we typically exclude I/O operations from performance testing, unless that is our focus.
The numbers related to disk access and network I/O are thousands of times higher than things happening in the processor, and will almost always be the bottleneck in data-intensive applications.
Measuring Performance
While it generally important to focus on reducing unnecessary algorithmic complexity, other bottlenecks may exist. In practice, we often want to measure the real-world performance of a piece of code.
If you have some code that is running slower than expected, you might decide you need to measure how long a particular piece of code takes. This would help you confirm which portion of the code is slow & provide a baseline to measure improvements against.
Let’s define two functions that do the same thing, to compare how they perform:
import time
= time.time()
start
str_add()= time.time() - start
elapsed
print(f"str_add took {elapsed:-.6f} seconds")
= time.time()
start
str_join()= time.time() - start
elapsed
print(f"str_join took {elapsed:-.6f} seconds")
str_add took 0.000065 seconds
str_join took 0.000022 seconds
But if we time it again, we might get different results:
for n in range(10):
= time.time()
start
str_add()= time.time() - start
elapsed
print(f"str_add run {n} took {elapsed:-.6f} seconds")
# short pause between runs (shows process variation)
0.1) time.sleep(
str_add run 0 took 0.000041 seconds
str_add run 1 took 0.000131 seconds
str_add run 2 took 0.000158 seconds
str_add run 3 took 0.000324 seconds
str_add run 4 took 0.000256 seconds
str_add run 5 took 0.000324 seconds
str_add run 6 took 0.000269 seconds
str_add run 7 took 0.000185 seconds
str_add run 8 took 0.000259 seconds
str_add run 9 took 0.000314 seconds
for n in range(10):
= time.time()
start
str_join()= time.time() - start
elapsed
print(f"str_join run {n} took {elapsed:-.6f} seconds")
# short pause between runs (shows process variation)
0.1) time.sleep(
str_join run 0 took 0.000030 seconds
str_join run 1 took 0.000075 seconds
str_join run 2 took 0.000054 seconds
str_join run 3 took 0.000053 seconds
str_join run 4 took 0.000042 seconds
str_join run 5 took 0.000060 seconds
str_join run 6 took 0.000047 seconds
str_join run 7 took 0.000128 seconds
str_join run 8 took 0.000048 seconds
str_join run 9 took 0.000090 seconds
While your code is running, your computer may be doing lots of other things in the background. This produces significant noise in our attempt to measure very short functions.
The noise here is significant, some runs of the same function take 10x longer than others.
To get an accurate sample, we can run this function thousands of times. The built-in timeit
module is specifically designed for this:
import timeit
"str_add()", number=100000, globals={"str_add": str_add}) timeit.timeit(
3.979762999981176
"str_join()", number=100000, globals={"str_join": str_join}) timeit.timeit(
0.7548895420040935
The signature of the timeit
function is:
timeit.timeit(='pass',
stmt='pass',
setup=<default timer>,
timer=1000000,
numberglobals=None
)
stmt
is a string that represents Python code to be tested, the parameter must be a string so Python can delay execution of your function until it is time to test it.setup
is also a string of code, but it will only be run once, and prior to the iteration ofstmt
.number
controls the number of iterations, higher numbers will lead to more accurate measurement.globals
allows you to pass in outside values that will be used in the statement. The form is a dictionary mapping names to values.
timeit
is primarily for testing code that runs quickly where thousands or millions of runs are likely.
Remember: If you have slow functions that are network or I/O heavy, those should typically be tested in smaller pieces since the networking & I/O portions will overwhelm the portions within your control.
Further Exploration
- timeit documentation - The timeit module can also be used as command line program.
- profile documentation - This module is beyond what we have time cover, but allows pinpointing the portion of the code that is called repeatedly or otherwise slow.
- BigO Visualizer
- Big O Notation @ Khan Academy
- Big O Cheat Sheet