6 HTTP & APIs

Goals

Introduce the idea of protocols, specifically HTTP.
Learn how to work with HTTP requests & APIs in Python.

In the last section, we saw that a key condition for being able to exchange data between programs is agreeing on a format.

To get two computers to communicate over a network, we need to agree on a protocol.

A protocol describes messages exchanged between multiple systems. This includes the format of the messages themselves, as well as how they are transmitted.

Example Protocol

Protocols can be thought of as a dialog between two systems:

Client: MENU

Server:

`              pizza  3.50            
               cookie 1.50
               chips  1.00
               juice  1.50
               water  1.00

Client: BUY pizza water

Server: PAY 4.50

Client: CARD 4.50 5555-2222-1111-3333

Server:: CONFIRMED

Like formats, the protocols themselves are arbitrary, what matters is that we have agreed upon the meaning of the messages in advance.

The most important and ubiquitous protocol today is HTTP.

Requests & Responses

HTTP was created for the web, originally to exchange HTML pages, but today powers much more.

When you visit a webpage, either by typing in the URL or clicking a link, your browser makes a request to a web server. The server would then send a response message.

Client sends HTTP request:

GET /index.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0
Accept: */*

Server sends HTTP response:

HTTP/1.1 200 OK
Content-Encoding: gzip
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 12 Jan 2025 10:00:00 GMT
Etag: "3147526947+gzip"
Expires: Sun, 19 Jan 2025 10:00:00 GMT
Last-Modified: Thu, 17 Oct 2024 07:18:26 GMT
Server: ECS (sec/96A4)
Content-Length: 648

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
</head>
<body>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents...</p>
</body>
</html>

Every time your browser loads a typical web page it makes many such requests. There will be at least one for the structure of the page, which comes back in HTML, as well as additional request for additional data and outside resources such as images.

HTTP Requests

An HTTP request is a block of formatted text that consists of several parts:

Verb: GET, POST, HEAD, PUT, DELETE, PATCH
URI: Everything after the domain name (e.g. /api/v3/hearings/?api_key=12345)
Protocol Version e.g. HTTP/1.1
Request Headers: String Key-value pairs, one per line
- User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Firefox/102.0
- Accept-Language: de
- Cookie: name=value; name2=value2; name3=value3
Body: Used for parameters in POST and other methods.

HTTP Verbs

Within the browser almost all requests will be either GET or POST.

A GET request is intended to fetch content from the server.

GET requests can typically be thought of as “read only”, a contract between the client and server that the server should not need to change anything– merely reply with content.

A POST request on the other hand is meant to be used to make a change on the server. It may also be used to send sensitive data to the server, such as when a user submits their username and password to log in.

To put these in familiar context, when you browse a webpage, clicking links/etc. you are making GET requests. When you fill out a form, sending data to the server, you are often making a POST request.

URIs/Query Strings

After the verb a request will list a URI, a path to the resource being requested.

These take the form of slashed paths like:

/index.html
/users/Paul/
/api/v3/hearings/house/118/

They can also contain a section known as a query string, which begins with a ? character. These query strings consist of key-value pairs separated by & characters.

/search?q=python
/page/?page=2&lang=en
/api/v3/hearings/house/118/?format=json&api_key=12345&offset=40

Taken together with the domain name, these form the full URL that you would see in your browser.

https://kagi.com/search?q=python
https://example.com/page/?page=2&lang=en
https://api.congress.gov/api/v3/hearings/house/118/?format=json&api_key=12345&offset=40

But you’ll notice that the domain name is not included in the URI.

It is instead part of the next section, the HTTP headers.

HTTP Headers

Taking a look at our initial request:

GET /index.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0
Accept: */*

The first line contains the verb, URI, and protocol.

The lines immediately following all take the form:

Key: Value

These are called headers, and provide a way for the browser (or any other HTTP client) to send supplemental information to the server that is not part of the URI.

HTTP defines a set of headers such as Host, User-Agent, and Accept.

Host is required and contains the domain name of the request.
User-Agent is required as well, and is the client’s way of identifying itself to the server. We’ll touch on this a bit more in the next chapter. When dealing with APIs you can typically rely on your tools’ default unless the API in question says otherwise.

You’ll often see other headers as well, such as the Accept we include here, these allow the client to make additional restrictions on its request such as asking the server to send pages back in a particular format or language.

See MDN: Request Headers for more details.

When it comes to user-defined headers, they may be used for anything the server chooses. This means while sometimes you’ll see parameters in the query string, some services will instead ask you to send them in the headers. This is a matter of preference, with no hard rules that everyone agrees upon.

HTTP Responses

Once a request is sent, the client will wait for a response. If the server is not available, or too slow to respond, you may see a timeout. But assuming all is working as intended, the server will respond in a format similar to the initial request.

Taking a look at our response:

HTTP/1.1 200 OK
Content-Encoding: gzip
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sun, 12 Jan 2025 10:00:00 GMT
Expires: Sun, 19 Jan 2025 10:00:00 GMT
Last-Modified: Thu, 17 Oct 2024 07:18:26 GMT
Content-Length: 648

<!doctype html>
<html>
... (truncated)

We can see the fields:

Protocol Version e.g. HTTP/1.1
Status: 200, 404, 500
Response Headers: key-value pairs similar to request headers
- Content-Type: text/html
- Content-Language: de-DE
Body: Actual content (e.g. HTML), but can be anything (images, audio, JSON, etc.)

HTTP Statuses

The server includes a status code in the first line of the response to help the client know the general category of response.

While additional detail may or may not be present in the body, the status of the response will indicate if the request succeeded, failed, or perhaps if it needs to be tried again.

200 - Typically 200 indicates a successful response. You will often see this written as 200 OK.
3xx - Redirects. The server suggests trying a slightly different request. These are sent when a page moves, or if there is some other slight issue in the request that the server can make a suggestion to fix.
4xx - Request errors. Common codes include 401 Unauthorized which suggests a password or API key is required and the common 404 Not Found indicating that the URI refers to an u unknown resource.
5xx - Server errors. These may be intermittent indicating the server is down. Unless you are the one writing code on the server, there’s rarely anything you can do to mitigate a 500 error.

See MDN HTTP response status codes for details on all standard status codes.

Common Response Headers

Response headers indicate information about the response, such as the format, length, and language. They can also be used by the browser to set cookies, information that persists between requests.

Content-Type: Will usually be present, and is an indication of what is contained in the response body. text/html would indicate the rest of the response is HTML meant to be rendered in a browser, application/json would mean that the response contains JSON that could be parsed by your code.

Warning

HTTP statuses and headers are “suggestions”, and it is completely possible for them to be wrong. It is not uncommon to see a misconfigured API sending back Content-Type: application/json but sending HTML or vice-versa. Use them as a hint, but be prepared to handle edge cases.

Body

The body of the response begins with a blank line after the headers.

The rest of the response will be in a format hopefully negotiated by the URL and request headers and noted in the response Content-Type header as described above.

In practice, one would split the rest of this text off and parse it with an appropriate (HTML, JSON, etc.) library.

Let’s take a look at using HTTP from Python to demonstrate.

HTTP in Python

There are lots of libraries that can send & receive HTTP requests, enough that Python has had multiple built-in versions in it’s history. It still provides a low-level interface to HTTP requests in urllib.request.

Like we do with choosing pytest over the built-in unittest, we are going to choose a more modern library with more convenience features and a simpler interface.

In this case, we will use httpx for HTTP requests.

Note

For full documentation of httpx visit https://www.python-httpx.org

As a third party library, remember you will need to run uv add httpx in your own projects if you want to use it.

httpx quick start

You can make GET and POST requests via the methods httpx.get and httpx.post. These return a response object that exposes properties like .status_code, .headers, and .text.

import httpx

response = httpx.get("https://example.com")
print("Status Code:", response.status_code)
print("Headers:")
for key, val in response.headers.items():
  print(f"   {key}: {val}")
print("Body (start):", repr(response.text[:50]))

Status Code: 200
Headers:
   accept-ranges: bytes
   content-type: text/html
   etag: "84238dfc8092e5d9c0dac8ef93371a07:1736799080.121134"
   last-modified: Mon, 13 Jan 2025 20:11:20 GMT
   vary: Accept-Encoding
   content-encoding: gzip
   content-length: 648
   cache-control: max-age=391
   date: Mon, 31 Mar 2025 15:49:12 GMT
   alt-svc: h3=":443"; ma=93600,h3-29=":443"; ma=93600,quic=":443"; ma=93600; v="43"
   connection: keep-alive
Body (start): '<!doctype html>\n<html>\n<head>\n    <title>Example D'

The request methods httpx.get, httpx.post, etc. take:

url: URL to fetch (required)
params: Optional dictionary of URL parameters. These will be converted to a query string. {"user": "test", "api_key": "val"} would become user=test&api_key=val
headers: Optional dictionary of request headers. (Defaults for required headers will be set if omitted.)
data: Optional dictionary representing the body of a POST made from a form.

These methods return a Response object with the following attributes:

r.status_code - numeric status code (200, 404, 500, etc.)
r.headers - response headers in a dict
r.content - raw bytes of response (for binary formats)
r.text - text of response (for HTML, etc.)

There are also quite a few convenience methods, among them:

r.json() - helper method to return parsed JSON if response was JSON
r.raise_for_status() - It is often the case that you want to use exceptions to handle 4xx and 5xx errors. If that’s the case, you can call r.raise_for_status() immediately after the response, and if it is a non-success error code.

Typical Status Code Handling

Often, you want your code to:

Follow redirects automatically, so if https://example.com/old points you to https://example.com/new automatically make that second request.
If a 4xx/5xx error is received, raise an exception so that the error message can be logged/reported/etc. but the program cannot proceed. (e.g. login failed, or the website is down)
If a 2xx is received, proceed as planned.

httpx has a parameter & a method that will help you with this:

# Adding follow_redirects=True will tell HTTPX
# to handle 3xx redirects automatically.
resp = httpx.get("https://example.com/will-404",
                 follow_redirects=True)

# Calling raise_for_status will raise an exception 
# if the status code is a 4xx or 5xx. This means you
# can be sure that lines after only execute if the request
# was a success.
try:
    resp.raise_for_status()
    do_normal_response_handlng(resp)
except Exception as e:
    # just for demonstration that the error was raised
    print(e)

Client error '404 Not Found' for url 'https://example.com/will-404'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

For additional parameters and details on these methods see HTTPX’s API Documentation.

HTTPX CLI

httpx has a helpful command line interface, it is a program as well as a library.

To run this program you need to run uv add httpx[cli] in a project.

You can also use the uvx feature of uv to run this program on its own:

uvx --with httpx[cli] httpx https://example.com

This is equivalent to installing httpx[cli] and running uv run python -m httpx, all at once.

HTTP APIs

An API, application programming interface is a consistent series of data structures and methods that allow two systems to interact.

You will often hear the term API used as shorthand for HTTP APIs, a subset of APIs that use HTTP as their communication layer.

If this sounds like a protocol, that’s more than fair. An HTTP API is in essence another custom protocol layered on top of HTTP.

HTTP APIs typically consist of endpoints which are specific URIs that return data and/or perform specific actions.

A social media service may define endpoints like:

Example API

GET /images/ - Return a list of image metadata as JSON.

Parameters:

num: Maximum number of images to return (20-100).
start_time: Optional start time for query.
end_time: Optional end time for query.
from_user: Optional user for query.

POST /images/ - Add a new image. Body should contain:

Required Header:

X-APIKEY: required API Key of logged in user.

Body:

caption: Caption for image (max 200 chars).
tagged_users: Optional list of usernames to tag.

These URLs in effect act like functions, with headers, URL parameters, and the body as three different ways to pass arguments.

For some real API documentation take a look at

Let’s see how the pieces of a URL are used in an actual API:

URLs can include authentication information but this is very uncommon in modern practice as it can be insecure. That is why you’ll typically see it put in headers.

As we’ve seen, httpx and other libraries will typically handle headers & parameters for us.

JSON API Examples

# API example 1: simple usage
import httpx
import json
from pprint import pprint

# entire URL as a string
url = ("https://api.fda.gov/food/enforcement.json"
       "?search=distribution_pattern:nationwide&limit=2")
response = httpx.get(url)

# load JSON manually
data = json.loads(response.text)

pprint(data)

{'meta': {'disclaimer': 'Do not rely on openFDA to make decisions regarding '
                        'medical care. While we make every effort to ensure '
                        'that data is accurate, you should assume all results '
                        'are unvalidated. We may limit or otherwise restrict '
                        'your access to the API in line with our Terms of '
                        'Service.',
          'last_updated': '2025-03-19',
          'license': 'https://open.fda.gov/license/',
          'results': {'limit': 2, 'skip': 0, 'total': 5419},
          'terms': 'https://open.fda.gov/terms/'},
 'results': [{'address_1': '2610 Homestead Pl',
              'address_2': 'N/A',
              'center_classification_date': '20200413',
              'city': 'Rancho Dominguez',
              'classification': 'Class III',
              'code_info': 'Lot codes: 72746',
              'country': 'United States',
              'distribution_pattern': 'nationwide, Canada and Netherlands',
              'event_id': '85253',
              'initial_firm_notification': 'Two or more of the following: '
                                           'Email, Fax, Letter, Press Release, '
                                           'Telephone, Visit',
              'more_code_info': '',
              'openfda': {},
              'postal_code': '90220-5610',
              'product_description': 'Pure Planet Organic Parasite Cleanse;  '
                                     'Net Wt. 174g Glass Jar;  Finished '
                                     'Product Item # 52700    Manufactured and '
                                     'Distributed by:  Pure Planet,  Rancho '
                                     'Dominguez, CA',
              'product_quantity': 'xx',
              'product_type': 'Food',
              'reason_for_recall': 'Firm was notified by supplier that Organic '
                                   'Ground Flaxseed powder was under recall by '
                                   'manufacturer due to unapproved herbicide - '
                                   'Haloxyfop',
              'recall_initiation_date': '20200224',
              'recall_number': 'F-0904-2020',
              'recalling_firm': 'Organic By Nature, Inc.',
              'report_date': '20200401',
              'state': 'CA',
              'status': 'Terminated',
              'termination_date': '20210202',
              'voluntary_mandated': 'Voluntary: Firm initiated'},
             {'address_1': '262 E Main St',
              'address_2': 'N/A',
              'center_classification_date': '20220719',
              'city': 'Lovell',
              'classification': 'Class II',
              'code_info': 'None',
              'country': 'United States',
              'distribution_pattern': 'Ten retail locations owned by Queen Bee '
                                      'in CO, NM, WY and nationwide via '
                                      'internet sales.',
              'event_id': '90158',
              'initial_firm_notification': 'Press Release',
              'openfda': {},
              'postal_code': '82431-2102',
              'product_description': 'Honey Caramels Blue Raspberry. Product '
                                     'available in 5.87 oz. bag, 1 lb. bag, 3 '
                                     'lb. bag, 6 lb. bag. PLU Code for 5.87 '
                                     'oz. bag: 788394 12675 8.',
              'product_quantity': '171 pieces',
              'product_type': 'Food',
              'reason_for_recall': 'Products may potentially contain one or '
                                   'more of the following undeclared tree '
                                   'nuts: Pecans, Almonds, Coconut, Macadamia '
                                   'Nuts, & Walnuts.',
              'recall_initiation_date': '20220428',
              'recall_number': 'F-1472-2022',
              'recalling_firm': 'Queen Bee Gardens, LLC',
              'report_date': '20220727',
              'state': 'WY',
              'status': 'Terminated',
              'termination_date': '20230117',
              'voluntary_mandated': 'Voluntary: Firm initiated'}]}

# API example 2: use more httpx features
import httpx
from pprint import pprint

url = "https://api.fda.gov/food/enforcement.json"

# let library generate query string
# gives us an easier format to work with
params = {"search": "distribution_pattern:nationwide",
          "limit": 2}

response = httpx.get(url, params=params)
# response objects have built in .json() method for decoding
pprint(response.json())

{'meta': {'disclaimer': 'Do not rely on openFDA to make decisions regarding '
                        'medical care. While we make every effort to ensure '
                        'that data is accurate, you should assume all results '
                        'are unvalidated. We may limit or otherwise restrict '
                        'your access to the API in line with our Terms of '
                        'Service.',
          'last_updated': '2025-03-19',
          'license': 'https://open.fda.gov/license/',
          'results': {'limit': 2, 'skip': 0, 'total': 5419},
          'terms': 'https://open.fda.gov/terms/'},
 'results': [{'address_1': '2610 Homestead Pl',
              'address_2': 'N/A',
              'center_classification_date': '20200413',
              'city': 'Rancho Dominguez',
              'classification': 'Class III',
              'code_info': 'Lot codes: 72746',
              'country': 'United States',
              'distribution_pattern': 'nationwide, Canada and Netherlands',
              'event_id': '85253',
              'initial_firm_notification': 'Two or more of the following: '
                                           'Email, Fax, Letter, Press Release, '
                                           'Telephone, Visit',
              'more_code_info': '',
              'openfda': {},
              'postal_code': '90220-5610',
              'product_description': 'Pure Planet Organic Parasite Cleanse;  '
                                     'Net Wt. 174g Glass Jar;  Finished '
                                     'Product Item # 52700    Manufactured and '
                                     'Distributed by:  Pure Planet,  Rancho '
                                     'Dominguez, CA',
              'product_quantity': 'xx',
              'product_type': 'Food',
              'reason_for_recall': 'Firm was notified by supplier that Organic '
                                   'Ground Flaxseed powder was under recall by '
                                   'manufacturer due to unapproved herbicide - '
                                   'Haloxyfop',
              'recall_initiation_date': '20200224',
              'recall_number': 'F-0904-2020',
              'recalling_firm': 'Organic By Nature, Inc.',
              'report_date': '20200401',
              'state': 'CA',
              'status': 'Terminated',
              'termination_date': '20210202',
              'voluntary_mandated': 'Voluntary: Firm initiated'},
             {'address_1': '262 E Main St',
              'address_2': 'N/A',
              'center_classification_date': '20220719',
              'city': 'Lovell',
              'classification': 'Class II',
              'code_info': 'None',
              'country': 'United States',
              'distribution_pattern': 'Ten retail locations owned by Queen Bee '
                                      'in CO, NM, WY and nationwide via '
                                      'internet sales.',
              'event_id': '90158',
              'initial_firm_notification': 'Press Release',
              'openfda': {},
              'postal_code': '82431-2102',
              'product_description': 'Honey Caramels Blue Raspberry. Product '
                                     'available in 5.87 oz. bag, 1 lb. bag, 3 '
                                     'lb. bag, 6 lb. bag. PLU Code for 5.87 '
                                     'oz. bag: 788394 12675 8.',
              'product_quantity': '171 pieces',
              'product_type': 'Food',
              'reason_for_recall': 'Products may potentially contain one or '
                                   'more of the following undeclared tree '
                                   'nuts: Pecans, Almonds, Coconut, Macadamia '
                                   'Nuts, & Walnuts.',
              'recall_initiation_date': '20220428',
              'recall_number': 'F-1472-2022',
              'recalling_firm': 'Queen Bee Gardens, LLC',
              'report_date': '20220727',
              'state': 'WY',
              'status': 'Terminated',
              'termination_date': '20230117',
              'voluntary_mandated': 'Voluntary: Firm initiated'}]}

# API example 3: Pagination (DO NOT RUN)
import httpx

url = "https://api.fda.gov/food/enforcement.json"
limit = 10
skip = 0
results = []

while skip < 100:
    params = {"search": "distribution_pattern:nationwide",
              "limit": limit, "skip": skip}
    print(f"Fetching {url} {params}")
    response = httpx.get(url, params=params)
    results += response.json()["results"]
    skip += limit
    
print(f"obtained {len(results)} results")

# API example 4: Pagination w/ Delay
import httpx
import time

url = "https://api.fda.gov/food/enforcement.json"
limit = 10
skip = 0
results = []

while skip < 100:
    time.sleep(1)
    params = {"search": "distribution_pattern:nationwide",
              "limit": limit, "skip": skip}
    print(f"Fetching {url} {params}")
    response = httpx.get(url, params=params)
    results += response.json()["results"]
    skip += limit

print(f"obtained {len(results)} results")

Further Exploration

Internet protocols are typically defined in documents known as RFCs. RFC 2616 describes HTTP/1.1.
The best resource for anything about HTTP or how the web works is MDN. MDN HTTP has details on the entire protocol, far beyond what we have here.
If you’d like to see a lower level HTTP library, you can explore urllib.request.
See the httpx documentation for specifics on HTTPX.