2  Python Tools


Goals


The Python Ecosystem

Much of Python’s early allure was due to its “batteries included” philosophy. When you install Python on your system, it comes with libraries you’ve been using already: datetime, math, csv, json, itertools, and collections to name a few. Collectively these make up the standard library. These can be imported on any Python installation without any additional setup. As long as we have the same version of Python, code that runs on your machine should run on mine if it sticks to these packages. Not every language has a robust standard library, languages like C and Rust take different approaches than Python.

This approach has its limits though:

  • The standard library is the “official” way to do something, but typically there needs to be room for experimentation with new ideas to get there.
  • Some things may never have a “right” way, people will always have different opinions for something as large as a web framework or game library.
  • Some packages need to evolve faster than the language’s year-long release cycle.
  • And of course the language can’t possibly include everything, so the standards for inclusion in the core language are high- which should come as a relief!

PyPI

For this reason, we have PyPI, the Python Package Index.

This is where any developer can upload their Python code for others to use.

So if you upload a package named jellyfish, then people will be able to install your code, type import jellyfish, and start using it.

Security Warning!

Did that worry you?

Take a minute and think about the things that code can do when you run it on your system.

A poorly-written or malicious library could crash your system, steal your data, or do virtually anything a user with their hands on your keyboard could do.

A few years ago a malicious user uploaded a package named jel1yfish, with the intention of confusing people intending to install jellyfish. They attacked hundreds of similarly named packages where a letter like an l or O could be replaced with a 1 or 0. These typo-squatted packages intended to hijack developer machines to mine cryptocurrency.

This attack was thwarted fortunately, but other similar attacks are surely happening regularly.

PyPI is a wonderful thing, but be fully aware that the packages available are not vetted, and so you should stick to packages that are well-known (and check for typos!) where possible, and when installing a niche package, do at least a basic check that the package is authentic.

If in doubt:

  • look for signs of activity on GitHub/etc.: a popular library with dozens of tutorials is one thing – an obscure library only one person used may also be fine, but worth a bit of vetting
  • look at the source code!
  • ask someone!

PyPI today has nearly 600,000 packages. If you are looking for code to help you with something, there’s a good chance you’ll be able to find it.

If Python’s early success was driven by the built in packages, its success today is primarily because of the ecosystem around it.

Python is used in data science in large part because of packages like pandas and SciPy, in web development because of Django and Flask. These are among the most popular Python packages, despite not being part of the standard library.

That means if we want to use pygame to work on our new game, we need to download it, but what do we do with a bunch of .py files?

Packaging: Python’s biggest flaw?

If you ask people what Python’s biggest flaw is, there’s a decent chance they say something about packaging.

The truth is, it has been a mess.

In recent years things have improved a lot, tools like poetry and pdm fixed a whole suite of problems. In this course, we are going to use a cutting-edge tool that has quickly become the favorite of most long-time Python developers: uv.

This section will explain why you’ll see references to pip, virtualenv, and conda as you explore the Python ecosystem, and why you should probably not use those tools.

pip is history

While I won’t bog you down in 20+ years of Python history, it is worth discussing for a second how the packaging ecosystem has changed in the last decade. If you look at the installation instructions for a randomly chosen Python library, most likely it will say to use the pip install command. Don’t!

pip install jellyfish downloads the jellyfish package and installs it to one of the directories on your sys.path, typically the one that contains site-packages. This is what you wanted, now you can import jellyfish.

But there is a problem waiting with this approach. Let’s say you work on three projects, and they have the following dependencies:

Project 1 (CAPP 30122)

  • polars==1.14
  • seaborn==2.0
    • matplotlib==3.0

Project 2 (MPCS 50999)

  • polars==0.44
  • matplotlib==2.0

Project 3 (Work)

  • custom_library
    • matplotlib==2.4

Your system would need two versions of polars and 3 versions of matplotlib to be able to reliably run the code in question. But we can only have one polars or matplotlib directory on our sys.path!

If we pip install the dependencies for Project 1, it will uninstall & overwrite the dependencies we installed at work, and vice-versa.

virtualenv

Recognizing this issue, virtualenv (aka venv) creates a directory unique to the project, that can be added and removed from sys.path to indicate which projects dependencies you want to be importable.

In practice this looks something like:

.
├── proj1-venv
│   ├── matplotlib             # v3.0
│   ├── polars
│   └── seaborn
├── proj2-venv
│   ├── matplotlib             # v2.0
│   └── polars
└── proj3-venv
    ├── altair
    ├── custom_library
    └── matplotlib             # v2.4

You can then use special activate scripts that each venv contains to modify your sys.path temporarily.

We won’t go into further detail, because we’re about to introduce a tool that will manage this for us.

More Details: Python Standard Library: venv

conda

While the pip and virtualenv tools were adopted by mainstream Python, the data science world developed Conda for its unique needs.

It is typically not needed today, and importantly for our purposes, does not follow standard Python conventions that have emerged in the last decade or so.

uv

uv is a new tool, first released in 2024 and has been rapidly gaining mindshare with Python developers. It is significantly easier to use than the amalgam of tools it replaces.

pyproject.toml

Modern Python applications typically use a file called pyproject.toml for metadata & to list what packages they depend upon.

An example pyproject.toml:

[project]
name = "legisplore"
version = "0.1.0"
description = "legislative data explorer"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
  "httpx>=0.27.2",
  "ipdb>=0.13.13",
  "ipython>=8.27.0",
  "polars>=1.18.0",
  "pytest>=8.3.4",
  "ruff>=0.8.4",
  "textual>=1.0.0",
]

uv sync

If you are working on a new project that already has a pyproject.toml, the first command you’ll want to run is uv sync.

This creates a virtual environment (in hidden directory .venv) and installs the appropriate dependencies.

You will also see this create/modify a uv.lock file. This file ensures that the same code on two machines have the exact same dependencies.

uv run <command>

Prefixing any command that you want to run with uv run means that it will be run with access to the uv-managed virtualenv.

For example:

$ uv sync
Resolved 24 packages in 0.97ms
Audited 14 packages in 0.11ms
$ uv run pytest                 # will run tests
$ uv run python file.py         # runs a file with the required dependencies
$ uv run ipython                # run ipython REPL with dependencies available

uv add / uv remove

These commands add and remove dependencies, they will both update your pyproject.toml, uv.lock, and local environment.

If you see a library that gives an instruction like: pip install Django

You would instead write: uv add Django

In assignments/projects where you are given the option to use different libraries: Be sure to commit the changes to pyproject.toml and uv.lock!

uv remove will do the opposite: the equivalent of pip uninstall or poetry remove, etc.

Auto-formatting & Linting with ruff

With uv in our tool belt, we can start depending on other packages.

The first one we’ll install is going to make your life a lot easier, and raise your grade too.

uv add ruff will install the ruff package, which also happens to be a command line application.

ruff is both a linter and an autoformatter.

Autoformatting with ruff format

If you read the official Python style guide PEP 8. The section on blank lines contains a lot of pedantic rules:

Avoid extraneous whitespace in the following situations:

  • Immediately inside parentheses, brackets or braces:
  • Between a trailing comma and a following close parenthesis:
  • Immediately before a comma, semicolon, or colon:
  • However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted:

These rules definitely lead to more readable code, but can be frustrating to remember & follow.

Wait, aren’t computers supposed to make our lives easier?

Autoformatters can automatically repair these kinds of errors in your code.

As long as you have valid Python code, ruff format myfile.py will reformat the file in question ensuring it adheres to accepted style guidelines.

For example:

# bad1.py
import json,os,sys
import random    # unused import
from datetime   import datetime

def process_data(x,y,     z):
    a=x+y
    if(z=="test"):
         return a*2
    else:
        return    a

def calculate_stuff( items ):
    total=0
    for i in range(len(items)):   # should use enumerate
        total+=items[i]
    

    return total

result = process_data(1,2,"test")
print(result)

After ruff format:

import json, os, sys
import random  # unused import
from datetime import datetime


def process_data(x, y, z):
    a = x + y
    if z == "test":
        return a * 2
    else:
        return a


def calculate_stuff(items):
    total = 0
    for i in range(len(items)):  # should use enumerate
        total += items[i]

    return total

result = process_data(1, 2, "test")
print(result)

This tool is minimally configurable, since the point is for us to all have the same style.

That means in some esoteric cases it will format your code in a strange, but predictable way.

Some people find themselves fighting it at first, but once you realize how much time it can save you the occasional strangeness is worth cleaner code at no additional effort.

This is not AI!

To head off any fears, this is not AI.

This does not in any way violate the academic honesty policy in this course (or any others I am aware of, but always check against an instructor’s syllabus or ask if in doubt).

While many teams are restricting or banning generative AI tools, linters and autoformatters are mandated on many teams and the majority of open source projects.

These tools are not rewriting your code, merely reformatting it. They are akin to a spell checker, the thought in the file is still 100% your own,

Linting via ruff check

Not every error is fixed by autoformatting, a linter can warn about things that you may need to fix yourself like unused imports and variables.

import json


def func(a, b, c):
    return a + b
$ uv run ruff check --select ARG --select F bad2.py
bad2.py:1:8: F401 [*] `json` imported but unused
  |
1 | import json
  |        ^^^^ F401
  |
  = help: Remove unused import: `json`

bad2.py:4:16: ARG001 Unused function argument: `c`
  |
4 | def func(a, b, c):
  |                ^ ARG001
5 |     return a + b
  |

Found 2 errors.
[*] 1 fixable with the `--fix` option.

Alternatives

Just like uv supplanted poetry, pdm, pip and others, there are many linters and autoformatters and others, there were many linters and autoformatters.

ruff’s main advantage is speed, and it incorporates the linter rules from dozens of other libraries, the most popular being flake8 and pylint.

Further Exploration


  1. If not that, they’ll mention the GIL, but that’s for another time.↩︎