from pathlib import Path
= Path("/home/user/projects/proj-1")
path print(path.parent)
print(path.parent.parent)
/home/user/projects
/home/user
Traditionally, more time is spent discussing how to read from & write to a file than how to build the right path in the first place.
In practice, since we now typically use conventional formats like json
, csv
, toml
, yaml
, etc. – your actual code will probably dedicate more effort to manipulating file paths than calling .readlines
and .write
on open files.
The canonical way to read from or write to a file in Python is to use with open
:
# read from a text file that already exists
with open("filename.txt") as f:
= f.read()
text
# open a new file for writing (erases existing contents)
with open("newfile.txt", "w") as f:
"hello filesystem!\n") f.write(
While it is possible to have multiple read/write statements within the block, when using a library like json
it is more common to see:
with open("newfile.txt", "w") as f:
# json.dumps returns a string, which we write to the file
f.write(json.dumps(data))
# or more succinctly:
with open("newfile.txt", "w") as f:
# json.dump takes a file handle as its second argument
json.dump(data, f)
The primary purpose of the with
statement is to ensure that if an error occurs within the indented with-block the file will be properly closed.
Whatever kind of text is inside your file, it is now far more common to use a library like json
than to manipulate the contents yourself.
This is equally true for binary files, for example when using the popular Pillow
library for manipulating images, it takes care of the file I/O itself.
Therefore, for most practical purposes, the complexities of file I/O are handled for you once you’ve managed to locate & open the correct file.
That however, is often easier said than done, many new programmer struggle with paths.
pathlib
is a relatively new addition to Python, which accounts for the fact that you’ll still see examples using less effective methods, particularly those from the os
and os.path
modules.
pathlib
makes working with file paths much easier. It will typically be used in tandem with an open
call or equivalent.
The primary thing the module contains is a type called Path
1
The Path
class represents a single file path, the path to the file I’m writing these words in for instance might be
"/home/james/sites/map-python-data/pathlib/index.qmd")` Path(
While paths may resemble strings, and can be instantiated from them, the Path
class offers additional behaviors that are specific to file paths.
.parent
Path
objects have a .parent
property that is equivalent to going up a directory:
from pathlib import Path
= Path("/home/user/projects/proj-1")
path print(path.parent)
print(path.parent.parent)
/home/user/projects
/home/user
Path
objects use /
to concatenate parts of a path.
We can use this to build paths out of components:
= Path("/home/james/sites/map-python-data")
BASE_DIR
for name in ["pathlib", "web-scraping", "debugging"]:
# Path overrides the "/" operator to work as concatenation
# this works with strings and Paths
= BASE_DIR / name / "index.qmd"
file_path print(file_path)
/home/james/sites/map-python-data/pathlib/index.qmd
/home/james/sites/map-python-data/web-scraping/index.qmd
/home/james/sites/map-python-data/debugging/index.qmd
This works on Windows as well as Unix-based systems. The path separators will be converted by the library so you can use /
and Windows will see \
where appropriate.
If you’ve written a file that works with paths you may have run into issues where it doesn’t always read the correct file.
Perhaps you had code like:
with open("filename.txt") as f:
f.read()
And found that sometimes it couldn’t find the file in question. Or if writing files, perhaps sometimes it wrote the file to a different directory than the one you expected.
The reason for this is that if a file path does not start with the root /
(or C:/
on Windows) it is relative.
These paths will be interpreted as if they begin with the current working directory.
This is an opaque concept, and a perfect example of why we tell you to avoid global variables.
Every running program has a global variable representing the “current working directory”, often the directory it was run from. When you are in your terminal you can see your terminal’s current working directory by typing pwd
. Similarly Python has functions to let you examine (os.getcwd
) and change (os.chdir
) the current working directory.
As you may recall, global variables can make it hard to reason about programs, since any function might modify them in unexpected ways.
# global variables create hard-to-follow code
= 100
some_variable
f()
g()
h()print(some_variable)
What will print? That depends on what f
, g
, and h
do to the global state!
As we’ll see, the key to robust file-handling that works equally well on your system as it does on your peers’ is to generally avoid using this global state altogether.
One solution to this problem is to use absolute paths, you may find that instead of open("data/target.json")
you can get your code to work when you use open("/home/user/projects/proj-2/data/target.json")
.
But this path is unique to your computer. On my machine I may need "/home/james/dev/proj2/data/target.json"
.
How can we do this without constantly dueling edits in our Git repository?
__file__
If we’re concerned about portability we want to have a way to say “the directory next to this one” or “the directory that is a parent of this one”.
Often we’re trying to create a layout like this:
proj-dir/
├── data
│ └── target.json
└── src
└── script.py
script.py
would like to be able to write to data/target.json
in a reliable way regardless of what the current working directory is.
We’d like to do this without knowing exactly where proj-dir
is as well, since it may be in /Users/james/projects
on one machine and /home/stephen/my-homework
on another.
To do this, we can define our paths using the relationship between the two files.
The algorithm for doing this is:
Python has a special variable __file__
that’ll help with step 1, and the rest of the steps we can do with standard path operators:
# assume we're in /home/james/projects/proj-dir/src/script.py
from pathlib import Path
# this creates a Path object that is the full path to script.py
# and then uses .parent to go up one level, to
# "/home/james/projects/proj-dir/src/"
= Path(__file__).parent
BASE_DIR
# Combine that path with a relative path from 'src'
# to the file in question.
# - up one directory, then into the data directory
= BASE_DIR / "../data/target.json" data_path
Forming paths using __file__
makes them consistent as long as the .py
files do not move relative to the data.
Path
objectsPath
objects can typically be passed in anywhere a filename is expected, so open("filename.txt", "w")
can become open(path_obj, "w")
. You can also write this as path.open("w")
. (See pathlib.Path.open.)
Path
objects also have quite a few helper methods that can make your life easier:
Path.exists
If you want to check if a given file exists, you can construct a path to it and then call .exists
:
path = BASE_DIR / "data.csv"
if path.exists():
read_and_process(path)
else:
create_initial_data(path)
Path.mkdir
A common pattern is to want to create a directory if it doesn’t exist:
log_directory = BASE_DIR / "logs"
log_directory.mkdir(exist_ok=True, parents=True)
This also demonstrates two useful parameters:
exist_ok=True
makes it so that the function will not raise an error if the directory already exists.parents=True
will also create parent directories if needed.If you are reading/writing the entire file in one go, instead of using the IO
object returned by open
, you can call read_text
and write_text
directly on the Path
as a shortcut.
= Path("file.txt")
p 'Text file contents')
p.write_text( p.read_text()
'Text file contents'
See the official pathlib documentation for more methods and examples.
If you look at the documentation, you’ll see a few related classes like PurePath
and PosixPath
. You can ignore those differences for the most part and use Path
.↩︎