4 Testing

Goals

Learn to write testable code and useful unit tests.
Explore some helpful features of pytest.

As you’ve discovered through working on assignments, tests are useful for ensuring that code works as expected.

You’ve been using pytest to run tests that have been provided for you, approximating something known as test-driven development.

Test-driven development (TDD) is a software development process that involves writing tests before writing the code that will be tested. By recognizing the need for a function & reasoning about what the expected output is, you can write tests that will fail if the code is not working correctly.

This process helps you think through problems, and you will often find that reaching a solution is much easier if you write tests first.

As you venture into writing larger programs, whether you adopt a TDD approach or write your tests as you go, you’ll find it valuable to write your own tests.

Writing Testable Code

We break our code into functions and classes to encapsulate functionality that we intend to reuse. These boundaries also provide a natural place to test our code.

If you only have one big function that does everything, it can be difficult to test:


def advance_all_students_with_passing_grades():
    conn = sqlite3.connect('students.db')
    c = conn.cursor()
    c.execute('''
    SELECT student.id, avg(class.grade) as average 
    FROM students JOIN classes ON students.id = classes.student_id
    GROUP BY student.id HAVING average >= 70
    ''')
    students = c.fetchall()
    for student in students:
        c.execute('UPDATE student_enrollment SET grade = grade + 1 WHERE student_id = ?', (student[0],))
    conn.commit()
    conn.close()

How would you test this function? You’d need to have a database with a specific set of data in it and then run the function and check that the data was updated as expected.

If you break the function up into smaller functions, you can test each function in isolation:


def get_students(conn, grade_threshold=70):
    ...

def advance_student(conn, student):
    ...

Now you can test each function in isolation.

By having the function take parameters, you can also test the function with different inputs.

It is also possible to test the function with a mock database connection that doesn’t actually connect to a database but provides the same interface.

This is called “mocking” and is a useful technique for testing code that interacts with external systems.

Writing Tests

There are many types of tests, but we’ll primarily focus on tests that verify a single behavior.

So if a function should sum a list of positive integers, we might write three distinct tests:

test that a list of positive integers is correctly summed
test that negative numbers raise an error (or whatever the intended behavior is)
test that an empty list sums to zero

Each of these could be considered a “unit” of the overall behavior. This kind of testing is known as unit testing.

pytest

pytest is a third-party library that makes writing tests in Python easier & less verbose than the built-in unittest module.

When you run pytest it will look for files named test_*.py in the current directory and its sub-directories. It will then run any functions in those files that start with test_.

assert statements

In Python, the assert statement is used to ensure that a condition is true.

If the condition is True, nothing happens. If the condition is False, an AssertionError is raised.

You can also provide a message to be printed if the assertion fails:

assert 1 == 2, "1 is not equal to 2"

assert is not a function!

A common mistake is to treat assert like a function, and use parentheses.

This leads to tests passing unexpectedly when they should fail:

assert(1 == 2, "1 is not equal to 2")

# is equivalent to:
assert (1 == 2, "1 is not equal to 2")

# which becomes:
assert (False, "1 is not equal to 2")

And a tuple with two elements is always True!

Truthiness

In Python, every type has an implicit conversion to a boolean value. This is called “truthiness”.

The following values are considered “falsey”:

False
None
0 # int
0.0 # float
0j # complex
"" # empty string
[] # empty list
() # empty tuple
{} # empty dict
set() # empty set

All other values are considered True.

Truthiness Demo:

values = [False, None, 0, 0.0, 0j, "", [], (), {}, set()]
values += [True, 42, 3.14, "hello", [1, 2, 3], {"a": 1}]
for value in values:
    # notice we're using the value as a boolean expression here
    if value:
        print(f"{value} is True")
    else:
        print(f"{value} is False")

False is False
None is False
0 is False
0.0 is False
0j is False
 is False
[] is False
() is False
{} is False
set() is False
True is True
42 is True
3.14 is True
hello is True
[1, 2, 3] is True
{'a': 1} is True

Writing good tests

Next, let’s take a look at some simple code and some appropriate unit tests:

# my_module.py
from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])

def circle_contains(radius: float, center: Point, point: Point):
    return (point.x - center.x) ** 2 + (point.y - center.y) ** 2 <= radius ** 2

def points_within(radius: float, center: Point, points: list[Point]):
    """ Find all points within a circle. """
    return [point for point in points if circle_contains(radius, center, point)]

# test_my_module.py

from my_module import circle_contains, points_within

origin = Point(0, 0)

def test_circle_contains():
    # centered at origin, radius 1
    assert circle_contains(1, origin, origin)
    assert circle_contains(1, origin, Point(.5, .5))

def test_circle_contains_edge():
    assert circle_contains(1, origin, Point(1, 0))  # on the circle

def test_circle_contains_outside():
    assert not circle_contains(1, origin, Point(1.1, 0))

One test per “behavior”

You may wonder, why did we write three tests instead of one?

def test_circle_contains():
    # centered at origin, radius 1
    assert circle_contains(1, origin, origin)
    assert circle_contains(1, origin, Point(.5, .5))
    assert circle_contains(1, origin, Point(1, 0))  # on the circle
    assert not circle_contains(1, origin, Point(1.1, 0))

If the first assertion fails, the second assertion will not be run.

This makes it harder to debug the problem:

If a test named test_circle_contains_edge fails that only tests one thing, you have an idea of where to look.

Granular tests make debugging easier. Consider assignments you’ve worked on, and whether or not you’d have appreciated a single test that tested a dozen different expected behaviors.

Test Readability

Tests should be easy to understand. This means that the test should be written in a way that makes it clear what is being tested and what the expected result is.

Make liberal use of comments and descriptive test names to make it clear what is being tested so that when a modification to the code in the future breaks a test, it is easy to understand why.

Test Independence

Tests should be independent of each other. This means that if one test fails, it should not affect the outcome of any other test.

This can be a challenge when testing functions that modify data or global state.

For example:

def test_create_user():
    db = Database("test.db")
    db.create_user(username="alice")
    assert db.get_user(username="alice").id == 1

def test_delete_user():
    db = Database("test.db")
    db.delete_user(username="alice")
    assert db.get_user(username="alice") is None

These tests are not independent. If the first test fails, the second test will fail because the database will be empty.

You’d instead likely need to do something like this:

def create_test_database():
    remove_file_if_exists("test.db")
    db = Database("test.db")
    db.init_schema()
    return db

def test_create_user():
    db = create_test_database()
    db.create_user(username="alice")
    assert db.get_user(username="alice").id == 1

def test_delete_user():
    db = create_test_database()
    db.create_user(username="alice")
    db.delete_user(username="alice")
    assert db.get_user(username="alice") is None

You may note that test_delete_user will fail if create_user doesn’t work. There’s still a dependency in terms of behavior in this case, but you can see that the tests can now be run independently of one another since each starts with a blank database.

Test Repeatability

Tests should be repeatable. This means that if a test fails, it should be possible to run it again and get the same result.

This means that tests should not depend on external factors such as:

The current time or date
Random numbers
The state of the network
The state of the database

To reduce the chance of a test failing due to an external factor, you can use a library like freezegun to freeze the current time that a test sees. The mock module can be used to mock out external functions so they return consistent data for the purpose of the test.

What tests to write?

When considering what to test, usually there are a 1-2 clear behaviors that need to be verified.

For a string comparison function, you would need to test that strmatch("abc", "abc") and strmatch("abc", "xyz") return the expected values.

It is then worth considering edge cases: what about empty strings? Does the function need to handle non-string input?

A helpful checklist to run through is: Zero, One, Many, Errors

If a function takes a collection of some kind, ensure that it performs as expected with an empty collection, a collection with one element, a collection with many elements, and an expected error condition, such as a dictionary passed instead of a list.

You do not need to test multiple iterations of the same behavior. If sum works with 4 elements, it probably works with 3 and 5 as well.

# example tests for the sum() function
def test_sum_empty():
    assert sum([]) == 0

def test_sum_one():
    assert sum([1]) == 1

def test_sum_many():
    assert sum([1, 2, 3]) == 6

def test_sum_type_error():
    with pytest.raises(TypeError):
        sum([1, "hello"])

`pytest` Features

We use pytest because it is easy to use and provides a lot of useful features, especially when contrasted with Python’s built-in unittest.

pytest provides both a command line tool pytest, which you’ve been using, and a library that you can use to help you write tests.

CLI Flags

pytest has some helpful command line options, among them:

Passing a filename like tests/test_markov.py will run only the tests in that file.
-v will print out more information about each test and give more detailed error output.
-vv will print out even more information about each test, particularly useful when comparing expected output vs. actual.
-s will include any output from print statements (normally suppressed by pytest).
-k <pattern> will only run tests whose names match the pattern. (So pytest -k frontend would match test_frontend_template and test_frontend_call but not test_database_setup)
-x will stop running tests after the first failure.

Fixtures

A fixture is a function that is run before each test. It can be used to set up the test environment or provide consistent test data.

import pytest

@pytest.fixture
def user_list()
    return [
        {"name": "alice", "id": 1, "email": "alice@domain"},
        {"name": "carol", "id": 3, "email": "carol@domain"},
        {"name": "bob", "id": 2, "email": "bob@domain"},
        {"name": "diego", "id": 4, "email": "diego@otherdomain"},
    ]

def test_sort_users(user_list):
    sorted_list = sort_users(user_list)
    assert sorted_list == [
        {"name": "alice", "id": 1, "email": "alice@domain"},
        {"name": "bob", "id": 2, "email": "bob@domain"},
        {"name": "carol", "id": 3, "email": "carol@domain"},
        {"name": "diego", "id": 4, "email": "diego@otherdomain"},
    ]

def test_filter_users(user_list):
    filtered_list = filter_users(user_list, domain="domain")
    assert filtered_list == [
        {"name": "alice", "id": 1, "email": "alice@domain"},
        {"name": "bob", "id": 2, "email": "bob@domain"},
        {"name": "carol", "id": 3, "email": "carol@domain"},
    ]

This is a powerful feature that can be used to set up complex test environments and ensure test independence as described above.

import pytest

@pytest.fixture
def db():
    remove_file_if_exists("test.db")
    db = Database("test.db")
    db.init_schema()
    return db

# parameter names must match fixture names
def test_create_user(db):
    db.create_user(username="alice")
    assert db.get_user(username="alice").id == 1

def test_delete_user(db):
    db.create_user(username="alice")
    db.delete_user(username="alice")
    assert db.get_user(username="alice") is None

Testing Exceptions

Ensuring that your functions handle errors as intended is an important behavior to include in your tests.

pytest.raises can be used to ensure that a function raises an exception.

def test_reject_invalid_domain():
    with pytest.raises(ValueError):
        validate_email("alice@invalid$")

If the exception is not raised within the with block, pytest will mark the test as failed.

Parameterized Tests

Sometimes the same test needs to be run with different inputs. pytest provides a way to do this with parameterized tests.

@pytest.mark.parametrize("str1,str2,expected", [
    ("abc", "abd", 1),
    ("abc", "abc", 0),
    ("abc", "xyz", 3),
])
def test_hamming_distance(str1, str2, expected):
    assert hamming_distance(str1, str2) == expected

This runs as three distinct tests in pytest, converting each input to a distinct test by calling the test function with the parameters.

Beyond Unit Testing

While unit testing is the most common type of testing, it can be important to test that the various components work together as expected as well. This is known as integration testing.

You will also encounter functional testing and end-to-end testing which are similar concepts, ensuring that the system works as intended, not just the individual parts.

These tests are typically much more complex and take a longer amount of time to run. In practice, you may run the unit tests every time you make a commit, but only run larger tests before a release.

Performance testing focuses on the speed at which a piece of code runs, often set up to watch for regressions where a change makes the code notably slower.