Introduce the idea of protocols, specifically HTTP.
Learn how to work with HTTP requests & APIs in Python.
In the last section, we saw that a key condition for being able to exchange data between programs is agreeing on a format.
To get two computers to communicate over a network, we need to agree on a protocol.
A protocol describes messages exchanged between multiple systems. This includes the format of the messages themselves, as well as how they are transmitted.
Example Protocol
Protocols can be thought of as a dialog between two systems:
Client:MENU
Server:
` pizza 3.50
cookie 1.50
chips 1.00
juice 1.50
water 1.00
Client:BUY pizza water
Server: PAY 4.50
Client: CARD 4.50 5555-2222-1111-3333
Server:: CONFIRMED
Like formats, the protocols themselves are arbitrary, what matters is that we have agreed upon the meaning of the messages in advance.
The most important and ubiquitous protocol today is HTTP.
Requests & Responses
HTTP was created for the web, originally to exchange HTML pages, but today powers much more.
When you visit a webpage, either by typing in the URL or clicking a link, your browser makes a request to a web server. The server would then send a response message.
Client sends HTTP request:
GET /index.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0
Accept: */*
Server sends HTTP response:
HTTP/1.1 200 OK
Content-Encoding: gzip
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 12 Jan 2025 10:00:00 GMT
Etag: "3147526947+gzip"
Expires: Sun, 19 Jan 2025 10:00:00 GMT
Last-Modified: Thu, 17 Oct 2024 07:18:26 GMT
Server: ECS (sec/96A4)
Content-Length: 648
<!doctype html>
<html>
<head>
<title>Example Domain</title>
</head>
<body>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents...</p>
</body>
</html>
Every time your browser loads a typical web page it makes many such requests. There will be at least one for the structure of the page, which comes back in HTML, as well as additional request for additional data and outside resources such as images.
HTTP Requests
An HTTP request is a block of formatted text that consists of several parts:
Verb: GET, POST, HEAD, PUT, DELETE, PATCH
URI: Everything after the domain name (e.g. /api/v3/hearings/?api_key=12345)
Protocol Version e.g. HTTP/1.1
Request Headers: String Key-value pairs, one per line
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Firefox/102.0
Accept-Language: de
Cookie: name=value; name2=value2; name3=value3
Body: Used for parameters in POST and other methods.
HTTP Verbs
Within the browser almost all requests will be either GET or POST.
A GET request is intended to fetch content from the server.
GET requests can typically be thought of as “read only”, a contract between the client and server that the server should not need to change anything– merely reply with content.
A POST request on the other hand is meant to be used to make a change on the server. It may also be used to send sensitive data to the server, such as when a user submits their username and password to log in.
To put these in familiar context, when you browse a webpage, clicking links/etc. you are making GET requests. When you fill out a form, sending data to the server, you are often making a POST request.
URIs/Query Strings
After the verb a request will list a URI, a path to the resource being requested.
These take the form of slashed paths like:
/index.html
/users/Paul/
/api/v3/hearings/house/118/
They can also contain a section known as a query string, which begins with a ? character. These query strings consist of key-value pairs separated by & characters.
But you’ll notice that the domain name is not included in the URI.
It is instead part of the next section, the HTTP headers.
HTTP Headers
Taking a look at our initial request:
GET /index.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0
Accept: */*
The first line contains the verb, URI, and protocol.
The lines immediately following all take the form:
Key: Value
These are called headers, and provide a way for the browser (or any other HTTP client) to send supplemental information to the server that is not part of the URI.
HTTP defines a set of headers such as Host, User-Agent, and Accept.
Host is required and contains the domain name of the request.
User-Agent is required as well, and is the client’s way of identifying itself to the server. We’ll touch on this a bit more in the next chapter. When dealing with APIs you can typically rely on your tools’ default unless the API in question says otherwise.
You’ll often see other headers as well, such as the Accept we include here, these allow the client to make additional restrictions on its request such as asking the server to send pages back in a particular format or language.
When it comes to user-defined headers, they may be used for anything the server chooses. This means while sometimes you’ll see parameters in the query string, some services will instead ask you to send them in the headers. This is a matter of preference, with no hard rules that everyone agrees upon.
HTTP Responses
Once a request is sent, the client will wait for a response. If the server is not available, or too slow to respond, you may see a timeout. But assuming all is working as intended, the server will respond in a format similar to the initial request.
Taking a look at our response:
HTTP/1.1 200 OK
Content-Encoding: gzip
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 12 Jan 2025 10:00:00 GMT
Expires: Sun, 19 Jan 2025 10:00:00 GMT
Last-Modified: Thu, 17 Oct 2024 07:18:26 GMT
Content-Length: 648
<!doctype html>
<html>
... (truncated)
We can see the fields:
Protocol Version e.g. HTTP/1.1
Status: 200, 404, 500
Response Headers: key-value pairs similar to request headers
Content-Type: text/html
Content-Language: de-DE
Body: Actual content (e.g. HTML), but can be anything (images, audio, JSON, etc.)
HTTP Statuses
The server includes a status code in the first line of the response to help the client know the general category of response.
While additional detail may or may not be present in the body, the status of the response will indicate if the request succeeded, failed, or perhaps if it needs to be tried again.
200 - Typically 200 indicates a successful response. You will often see this written as 200 OK.
3xx - Redirects. The server suggests trying a slightly different request. These are sent when a page moves, or if there is some other slight issue in the request that the server can make a suggestion to fix.
4xx - Request errors. Common codes include 401 Unauthorized which suggests a password or API key is required and the common 404 Not Found indicating that the URI refers to an u unknown resource.
5xx - Server errors. These may be intermittent indicating the server is down. Unless you are the one writing code on the server, there’s rarely anything you can do to mitigate a 500 error.
Response headers indicate information about the response, such as the format, length, and language. They can also be used by the browser to set cookies, information that persists between requests.
Content-Type: Will usually be present, and is an indication of what is contained in the response body. text/html would indicate the rest of the response is HTML meant to be rendered in a browser, application/json would mean that the response contains JSON that could be parsed by your code.
Warning
HTTP statuses and headers are “suggestions”, and it is completely possible for them to be wrong. It is not uncommon to see a misconfigured API sending back Content-Type: application/json but sending HTML or vice-versa. Use them as a hint, but be prepared to handle edge cases.
Body
The body of the response begins with a blank line after the headers.
The rest of the response will be in a format hopefully negotiated by the URL and request headers and noted in the response Content-Type header as described above.
In practice, one would split the rest of this text off and parse it with an appropriate (HTML, JSON, etc.) library.
Let’s take a look at using HTTP from Python to demonstrate.
HTTP in Python
There are lots of libraries that can send & receive HTTP requests, enough that Python has had multiple built-in versions in it’s history. It still provides a low-level interface to HTTP requests in urllib.request.
Like we do with choosing pytest over the built-in unittest, we are going to choose a more modern library with more convenience features and a simpler interface.
In this case, we will use httpx for HTTP requests.
As a third party library, remember you will need to run uv add httpx in your own projects if you want to use it.
httpx quick start
You can make GET and POST requests via the methods httpx.get and httpx.post. These return a response object that exposes properties like .status_code, .headers, and .text.
import httpxresponse = httpx.get("https://example.com")print("Status Code:", response.status_code)print("Headers:")for key, val in response.headers.items():print(f" {key}: {val}")print("Body (start):", repr(response.text[:50]))
Status Code: 200
Headers:
accept-ranges: bytes
content-type: text/html
etag: "84238dfc8092e5d9c0dac8ef93371a07:1736799080.121134"
last-modified: Mon, 13 Jan 2025 20:11:20 GMT
vary: Accept-Encoding
content-encoding: gzip
content-length: 648
cache-control: max-age=391
date: Mon, 31 Mar 2025 15:49:12 GMT
alt-svc: h3=":443"; ma=93600,h3-29=":443"; ma=93600,quic=":443"; ma=93600; v="43"
connection: keep-alive
Body (start): '<!doctype html>\n<html>\n<head>\n <title>Example D'
The request methods httpx.get, httpx.post, etc. take:
url: URL to fetch (required)
params: Optional dictionary of URL parameters. These will be converted to a query string. {"user": "test", "api_key": "val"} would become user=test&api_key=val
headers: Optional dictionary of request headers. (Defaults for required headers will be set if omitted.)
data: Optional dictionary representing the body of a POST made from a form.
These methods return a Response object with the following attributes:
r.status_code - numeric status code (200, 404, 500, etc.)
r.headers - response headers in a dict
r.content - raw bytes of response (for binary formats)
r.text - text of response (for HTML, etc.)
There are also quite a few convenience methods, among them:
r.json() - helper method to return parsed JSON if response was JSON
r.raise_for_status() - It is often the case that you want to use exceptions to handle 4xx and 5xx errors. If that’s the case, you can call r.raise_for_status() immediately after the response, and if it is a non-success error code.
Typical Status Code Handling
Often, you want your code to:
Follow redirects automatically, so if https://example.com/old points you to https://example.com/new automatically make that second request.
If a 4xx/5xx error is received, raise an exception so that the error message can be logged/reported/etc. but the program cannot proceed. (e.g. login failed, or the website is down)
If a 2xx is received, proceed as planned.
httpx has a parameter & a method that will help you with this:
# Adding follow_redirects=True will tell HTTPX# to handle 3xx redirects automatically.resp = httpx.get("https://example.com/will-404", follow_redirects=True)# Calling raise_for_status will raise an exception # if the status code is a 4xx or 5xx. This means you# can be sure that lines after only execute if the request# was a success.try: resp.raise_for_status() do_normal_response_handlng(resp)exceptExceptionas e:# just for demonstration that the error was raisedprint(e)
Client error '404 Not Found' for url 'https://example.com/will-404'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
Let’s see how the pieces of a URL are used in an actual API:
URLs can include authentication information but this is very uncommon in modern practice as it can be insecure. That is why you’ll typically see it put in headers.
As we’ve seen, httpx and other libraries will typically handle headers & parameters for us.
JSON API Examples
# API example 1: simple usageimport httpximport jsonfrom pprint import pprint# entire URL as a stringurl = ("https://api.fda.gov/food/enforcement.json""?search=distribution_pattern:nationwide&limit=2")response = httpx.get(url)# load JSON manuallydata = json.loads(response.text)pprint(data)
{'meta': {'disclaimer': 'Do not rely on openFDA to make decisions regarding '
'medical care. While we make every effort to ensure '
'that data is accurate, you should assume all results '
'are unvalidated. We may limit or otherwise restrict '
'your access to the API in line with our Terms of '
'Service.',
'last_updated': '2025-03-19',
'license': 'https://open.fda.gov/license/',
'results': {'limit': 2, 'skip': 0, 'total': 5419},
'terms': 'https://open.fda.gov/terms/'},
'results': [{'address_1': '2610 Homestead Pl',
'address_2': 'N/A',
'center_classification_date': '20200413',
'city': 'Rancho Dominguez',
'classification': 'Class III',
'code_info': 'Lot codes: 72746',
'country': 'United States',
'distribution_pattern': 'nationwide, Canada and Netherlands',
'event_id': '85253',
'initial_firm_notification': 'Two or more of the following: '
'Email, Fax, Letter, Press Release, '
'Telephone, Visit',
'more_code_info': '',
'openfda': {},
'postal_code': '90220-5610',
'product_description': 'Pure Planet Organic Parasite Cleanse; '
'Net Wt. 174g Glass Jar; Finished '
'Product Item # 52700 Manufactured and '
'Distributed by: Pure Planet, Rancho '
'Dominguez, CA',
'product_quantity': 'xx',
'product_type': 'Food',
'reason_for_recall': 'Firm was notified by supplier that Organic '
'Ground Flaxseed powder was under recall by '
'manufacturer due to unapproved herbicide - '
'Haloxyfop',
'recall_initiation_date': '20200224',
'recall_number': 'F-0904-2020',
'recalling_firm': 'Organic By Nature, Inc.',
'report_date': '20200401',
'state': 'CA',
'status': 'Terminated',
'termination_date': '20210202',
'voluntary_mandated': 'Voluntary: Firm initiated'},
{'address_1': '262 E Main St',
'address_2': 'N/A',
'center_classification_date': '20220719',
'city': 'Lovell',
'classification': 'Class II',
'code_info': 'None',
'country': 'United States',
'distribution_pattern': 'Ten retail locations owned by Queen Bee '
'in CO, NM, WY and nationwide via '
'internet sales.',
'event_id': '90158',
'initial_firm_notification': 'Press Release',
'openfda': {},
'postal_code': '82431-2102',
'product_description': 'Honey Caramels Blue Raspberry. Product '
'available in 5.87 oz. bag, 1 lb. bag, 3 '
'lb. bag, 6 lb. bag. PLU Code for 5.87 '
'oz. bag: 788394 12675 8.',
'product_quantity': '171 pieces',
'product_type': 'Food',
'reason_for_recall': 'Products may potentially contain one or '
'more of the following undeclared tree '
'nuts: Pecans, Almonds, Coconut, Macadamia '
'Nuts, & Walnuts.',
'recall_initiation_date': '20220428',
'recall_number': 'F-1472-2022',
'recalling_firm': 'Queen Bee Gardens, LLC',
'report_date': '20220727',
'state': 'WY',
'status': 'Terminated',
'termination_date': '20230117',
'voluntary_mandated': 'Voluntary: Firm initiated'}]}
# API example 2: use more httpx featuresimport httpxfrom pprint import pprinturl ="https://api.fda.gov/food/enforcement.json"# let library generate query string# gives us an easier format to work withparams = {"search": "distribution_pattern:nationwide","limit": 2}response = httpx.get(url, params=params)# response objects have built in .json() method for decodingpprint(response.json())
{'meta': {'disclaimer': 'Do not rely on openFDA to make decisions regarding '
'medical care. While we make every effort to ensure '
'that data is accurate, you should assume all results '
'are unvalidated. We may limit or otherwise restrict '
'your access to the API in line with our Terms of '
'Service.',
'last_updated': '2025-03-19',
'license': 'https://open.fda.gov/license/',
'results': {'limit': 2, 'skip': 0, 'total': 5419},
'terms': 'https://open.fda.gov/terms/'},
'results': [{'address_1': '2610 Homestead Pl',
'address_2': 'N/A',
'center_classification_date': '20200413',
'city': 'Rancho Dominguez',
'classification': 'Class III',
'code_info': 'Lot codes: 72746',
'country': 'United States',
'distribution_pattern': 'nationwide, Canada and Netherlands',
'event_id': '85253',
'initial_firm_notification': 'Two or more of the following: '
'Email, Fax, Letter, Press Release, '
'Telephone, Visit',
'more_code_info': '',
'openfda': {},
'postal_code': '90220-5610',
'product_description': 'Pure Planet Organic Parasite Cleanse; '
'Net Wt. 174g Glass Jar; Finished '
'Product Item # 52700 Manufactured and '
'Distributed by: Pure Planet, Rancho '
'Dominguez, CA',
'product_quantity': 'xx',
'product_type': 'Food',
'reason_for_recall': 'Firm was notified by supplier that Organic '
'Ground Flaxseed powder was under recall by '
'manufacturer due to unapproved herbicide - '
'Haloxyfop',
'recall_initiation_date': '20200224',
'recall_number': 'F-0904-2020',
'recalling_firm': 'Organic By Nature, Inc.',
'report_date': '20200401',
'state': 'CA',
'status': 'Terminated',
'termination_date': '20210202',
'voluntary_mandated': 'Voluntary: Firm initiated'},
{'address_1': '262 E Main St',
'address_2': 'N/A',
'center_classification_date': '20220719',
'city': 'Lovell',
'classification': 'Class II',
'code_info': 'None',
'country': 'United States',
'distribution_pattern': 'Ten retail locations owned by Queen Bee '
'in CO, NM, WY and nationwide via '
'internet sales.',
'event_id': '90158',
'initial_firm_notification': 'Press Release',
'openfda': {},
'postal_code': '82431-2102',
'product_description': 'Honey Caramels Blue Raspberry. Product '
'available in 5.87 oz. bag, 1 lb. bag, 3 '
'lb. bag, 6 lb. bag. PLU Code for 5.87 '
'oz. bag: 788394 12675 8.',
'product_quantity': '171 pieces',
'product_type': 'Food',
'reason_for_recall': 'Products may potentially contain one or '
'more of the following undeclared tree '
'nuts: Pecans, Almonds, Coconut, Macadamia '
'Nuts, & Walnuts.',
'recall_initiation_date': '20220428',
'recall_number': 'F-1472-2022',
'recalling_firm': 'Queen Bee Gardens, LLC',
'report_date': '20220727',
'state': 'WY',
'status': 'Terminated',
'termination_date': '20230117',
'voluntary_mandated': 'Voluntary: Firm initiated'}]}
# API example 3: Pagination (DO NOT RUN)import httpxurl ="https://api.fda.gov/food/enforcement.json"limit =10skip =0results = []while skip <100: params = {"search": "distribution_pattern:nationwide","limit": limit, "skip": skip}print(f"Fetching {url}{params}") response = httpx.get(url, params=params) results += response.json()["results"] skip += limitprint(f"obtained {len(results)} results")