2 Python Tools
Goals
- Learn how to work with third-party packages.
- Learn a few tools you’ll use regularly as a Python programmer.
The Python Ecosystem
Much of Python’s early allure was due to its “batteries included” philosophy. When you install Python on your system, it comes with libraries you’ve been using already: datetime
, math
, csv
, json
, itertools
, and collections
to name a few. Collectively these make up the standard library. These can be imported on any Python installation without any additional setup. As long as we have the same version of Python, code that runs on your machine should run on mine if it sticks to these packages. Not every language has a robust standard library, languages like C and Rust take different approaches than Python.
This approach has its limits though:
- The standard library is the “official” way to do something, but typically there needs to be room for experimentation with new ideas to get there.
- Some things may never have a “right” way, people will always have different opinions for something as large as a web framework or game library.
- Some packages need to evolve faster than the language’s year-long release cycle.
- And of course the language can’t possibly include everything, so the standards for inclusion in the core language are high- which should come as a relief!
PyPI
For this reason, we have PyPI, the Python Package Index.
This is where any developer can upload their Python code for others to use.
So if you upload a package named jellyfish
, then people will be able to install your code, type import jellyfish
, and start using it.
Did that worry you?
Take a minute and think about the things that code can do when you run it on your system.
A poorly-written or malicious library could crash your system, steal your data, or do virtually anything a user with their hands on your keyboard could do.
A few years ago a malicious user uploaded a package named jel1yfish
, with the intention of confusing people intending to install jellyfish
. They attacked hundreds of similarly named packages where a letter like an l
or O
could be replaced with a 1
or 0
. These typo-squatted packages intended to hijack developer machines to mine cryptocurrency.
This attack was thwarted fortunately, but other similar attacks are surely happening regularly.
PyPI is a wonderful thing, but be fully aware that the packages available are not vetted, and so you should stick to packages that are well-known (and check for typos!) where possible, and when installing a niche package, do at least a basic check that the package is authentic.
If in doubt:
- look for signs of activity on GitHub/etc.: a popular library with dozens of tutorials is one thing – an obscure library only one person used may also be fine, but worth a bit of vetting
- look at the source code!
- ask someone!
PyPI today has nearly 600,000 packages. If you are looking for code to help you with something, there’s a good chance you’ll be able to find it.
If Python’s early success was driven by the built in packages, its success today is primarily because of the ecosystem around it.
Python is used in data science in large part because of packages like pandas
and SciPy
, in web development because of Django
and Flask
. These are among the most popular Python packages, despite not being part of the standard library.
That means if we want to use pygame
to work on our new game, we need to download it, but what do we do with a bunch of .py files?
Packaging: Python’s biggest flaw?
If you ask people what Python’s biggest flaw is, there’s a decent chance they say something about packaging. 1
The truth is, it has been a mess.
In recent years things have improved a lot, tools like poetry
and pdm
fixed a whole suite of problems. In this course, we are going to use a cutting-edge tool that has quickly become the favorite of most long-time Python developers: uv
.
This section will explain why you’ll see references to pip
, virtualenv
, and conda
as you explore the Python ecosystem, and why you should probably not use those tools.
pip is history
While I won’t bog you down in 20+ years of Python history, it is worth discussing for a second how the packaging ecosystem has changed in the last decade. If you look at the installation instructions for a randomly chosen Python library, most likely it will say to use the pip install
command. Don’t!
pip install jellyfish
downloads the jellyfish
package and installs it to one of the directories on your sys.path
, typically the one that contains site-packages
. This is what you wanted, now you can import jellyfish
.
But there is a problem waiting with this approach. Let’s say you work on three projects, and they have the following dependencies:
Project 1 (CAPP 30122)
- polars==1.14
- seaborn==2.0
- matplotlib==3.0
Project 2 (MPCS 50999)
- polars==0.44
- matplotlib==2.0
Project 3 (Work)
custom_library
- matplotlib==2.4
Your system would need two versions of polars
and 3 versions of matplotlib
to be able to reliably run the code in question. But we can only have one polars
or matplotlib
directory on our sys.path
!
If we pip install
the dependencies for Project 1, it will uninstall & overwrite the dependencies we installed at work, and vice-versa.
virtualenv
Recognizing this issue, virtualenv
(aka venv
) creates a directory unique to the project, that can be added and removed from sys.path
to indicate which projects dependencies you want to be importable.
In practice this looks something like:
.
├── proj1-venv
│ ├── matplotlib # v3.0
│ ├── polars
│ └── seaborn
├── proj2-venv
│ ├── matplotlib # v2.0
│ └── polars
└── proj3-venv
├── altair
├── custom_library
└── matplotlib # v2.4
You can then use special activate
scripts that each venv
contains to modify your sys.path
temporarily.
We won’t go into further detail, because we’re about to introduce a tool that will manage this for us.
More Details: Python Standard Library: venv
conda
While the pip
and virtualenv
tools were adopted by mainstream Python, the data science world developed Conda for its unique needs.
It is typically not needed today, and importantly for our purposes, does not follow standard Python conventions that have emerged in the last decade or so.
uv
uv
is a new tool, first released in 2024 and has been rapidly gaining mindshare with Python developers. It is significantly easier to use than the amalgam of tools it replaces.
pyproject.toml
Modern Python applications typically use a file called pyproject.toml
for metadata & to list what packages they depend upon.
An example pyproject.toml
:
[project]
name = "legisplore"
version = "0.1.0"
description = "legislative data explorer"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"httpx>=0.27.2",
"ipdb>=0.13.13",
"ipython>=8.27.0",
"polars>=1.18.0",
"pytest>=8.3.4",
"ruff>=0.8.4",
"textual>=1.0.0",
]
uv sync
If you are working on a new project that already has a pyproject.toml
, the first command you’ll want to run is uv sync
.
This creates a virtual environment (in hidden directory .venv
) and installs the appropriate dependencies.
You will also see this create/modify a uv.lock
file. This file ensures that the same code on two machines have the exact same dependencies.
uv run <command>
Prefixing any command that you want to run with uv run
means that it will be run with access to the uv
-managed virtualenv.
For example:
$ uv sync
Resolved 24 packages in 0.97ms
Audited 14 packages in 0.11ms
$ uv run pytest # will run tests
$ uv run python file.py # runs a file with the required dependencies
$ uv run ipython # run ipython REPL with dependencies available
uv add
/ uv remove
These commands add and remove dependencies, they will both update your pyproject.toml
, uv.lock
, and local environment.
If you see a library that gives an instruction like: pip install Django
You would instead write: uv add Django
In assignments/projects where you are given the option to use different libraries: Be sure to commit the changes to pyproject.toml
and uv.lock
!
uv remove
will do the opposite: the equivalent of pip uninstall
or poetry remove
, etc.
Auto-formatting & Linting with ruff
With uv
in our tool belt, we can start depending on other packages.
The first one we’ll install is going to make your life a lot easier, and raise your grade too.
uv add ruff
will install the ruff
package, which also happens to be a command line application.
ruff
is both a linter and an autoformatter.
Autoformatting with ruff format
If you read the official Python style guide PEP 8. The section on blank lines contains a lot of pedantic rules:
Avoid extraneous whitespace in the following situations:
- Immediately inside parentheses, brackets or braces:
- Between a trailing comma and a following close parenthesis:
- Immediately before a comma, semicolon, or colon:
- However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted:
- …
These rules definitely lead to more readable code, but can be frustrating to remember & follow.
Wait, aren’t computers supposed to make our lives easier?
Autoformatters can automatically repair these kinds of errors in your code.
As long as you have valid Python code, ruff format myfile.py
will reformat the file in question ensuring it adheres to accepted style guidelines.
For example:
# bad1.py
import json,os,sys
import random # unused import
from datetime import datetime
def process_data(x,y, z):
=x+y
aif(z=="test"):
return a*2
else:
return a
def calculate_stuff( items ):
=0
totalfor i in range(len(items)): # should use enumerate
+=items[i]
total
return total
= process_data(1,2,"test")
result print(result)
After ruff format
:
import json, os, sys
import random # unused import
from datetime import datetime
def process_data(x, y, z):
= x + y
a if z == "test":
return a * 2
else:
return a
def calculate_stuff(items):
= 0
total for i in range(len(items)): # should use enumerate
+= items[i]
total
return total
= process_data(1, 2, "test")
result print(result)
This tool is minimally configurable, since the point is for us to all have the same style.
That means in some esoteric cases it will format your code in a strange, but predictable way.
Some people find themselves fighting it at first, but once you realize how much time it can save you the occasional strangeness is worth cleaner code at no additional effort.
To head off any fears, this is not AI.
This does not in any way violate the academic honesty policy in this course (or any others I am aware of, but always check against an instructor’s syllabus or ask if in doubt).
While many teams are restricting or banning generative AI tools, linters and autoformatters are mandated on many teams and the majority of open source projects.
These tools are not rewriting your code, merely reformatting it. They are akin to a spell checker, the thought in the file is still 100% your own,
Linting via ruff check
Not every error is fixed by autoformatting, a linter can warn about things that you may need to fix yourself like unused imports and variables.
import json
def func(a, b, c):
return a + b
$ uv run ruff check --select ARG --select F bad2.py
bad2.py:1:8: F401 [*] `json` imported but unused
|
1 | import json
| ^^^^ F401
|
= help: Remove unused import: `json`
bad2.py:4:16: ARG001 Unused function argument: `c`
|
4 | def func(a, b, c):
| ^ ARG001
5 | return a + b
|
Found 2 errors.
[*] 1 fixable with the `--fix` option.
Alternatives
Just like uv
supplanted poetry
, pdm
, pip
and others, there are many linters and autoformatters and others, there were many linters and autoformatters.
ruff
’s main advantage is speed, and it incorporates the linter rules from dozens of other libraries, the most popular being flake8
and pylint
.
Further Exploration
If not that, they’ll mention the GIL, but that’s for another time.↩︎