Creating a city bike data set through API requests in Python

In Oslo, there is a really nice city bike service. They have a developer API which you can query to find out how many bikes are currently available at each station. In this post, we will create a Python program (or script) which queries the API and stores the relevant data in a fitting format. We will then automate it so that we run the program every 5 minutes, getting a log of the availabilities over time. We can then later look more closely at how bikes move around. In particular we might be interested in seeing how often the City Bike workers fill up a station with bikes. This is a never-ending problem, since Oslo’s geography is such that the city center is at a low altitude, and so the bikes end up there, but no-one rides them back up. But the analysis can wait for a later time!

Making a request to the API

First, you need to sign up to get an identifier, which will be our key to access the API. Let’s say our key is 12345abcd. We store this in a Python file called secrets.py, like so

identifier = '1234abcd'

To make a request to the service, we can use curl, doing

curl -H "Client-Identifier: "1234abcd"" \
  https://oslobysykkel.no/api/v1/stations/availability

and in return we get a (long) JSON string. So far so good. Let’s try doing this in Python, since that makes parsing the JSON string easy.

In Python

A good library for doing HTML requests (or API calls) is requests. What we’re interested in for now is the availability of a station. We write the following small program,

import requests

from secret import identifier

url = 'https://oslobysykkel.no/api/v1/stations/availability'
headers = {'Client-Identifier': identifier}
r = requests.get(url, headers=headers)

result = r.json()
stations = result['stations']
first_station = stations[0]

for key, value in first_station.items():
    print(key, value)

which will print something like this:

id 430
availability {'bikes': 0, 'locks': 24}

In other words, for each station we get a station id and the number of bikes and locks currently available. Another thing available in the JSON is the timestamp that the availability data was updated. We get that by accessing result['updated_at']. It looks like 2016-09-09T09:50:33+00:00. If we are only interested in the date and the time in hour and minutes, we can do

timestamp = result['updated_at']
date, time = timestamp[:16].split('T')

We now have the data we need to make a data set of the availabilities. We add some lines extracting the relevant data,

for station in stations:
    id_ = station['id']
    bikes = station['availability']['bikes']
    locks = station['availability']['locks']

and if we also do print(f"{date},{time},{id_},{bikes},{locks}") (assuming we have at least version 3.6 of Python), we get nice comma separated data about each station with time.

2018-05-03,11:59,432,1,25

We can now make a complete program which writes this data to file. To do this we need to open a file and append the data to it. By using the flag 'a' to the open function call, we append to the file. If filename doesn’t already exist, it’s created.

with open(filename, 'a') as datafile:
    for station in stations:
        ...
        datafile.write(f"{date},{time},{id_},{bikes},{locks}\n")

We now have a program which makes a request, extracts the relevant data, and writes it to a file. See here for the complete program. What’s missing is to automate the process!

Automation

On Unix systems (such as macOS) we have access to cron, which is a time-based job scheduler utility. We should also have access to curl, which we used previously. To add a job to cron, we can run env EDITOR=nano crontab -e in the terminal. This opens the crontab table in the nano editor. Each line in the table defines when and what commands or scripts to run. I ran into some problems when calling the Python script directly, so the solution was to create a bash script called call_script.sh, which only contains

#!/bin/bash

my/python/path/python3 availabilities2file.py

where my/python/path is the path to the Python interpreter we want to use, say /bin/python. To make sure the script is executable, we do chmod +x call_script.sh. We are now ready to add a line to the crontab table. A cron job is specified in the following format (source):

* * * * *  command to execute
│ │ │ │ │
│ │ │ │ └─── day of week (0 - 6)
│ │ │ │      (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
│ │ │ └──────── month (1 - 12)
│ │ └───────────── day of month (1 - 31)
│ └────────────────── hour (0 - 23)
└─────────────────────── min (0 - 59)

We want to run the script every 5 minutes, between 6AM and midnight, and so we add the following line:

*/5 6-23 * * * cd ~/Documents/dev/OsloCityBike/ && ./call_script.sh

I set this up on the server which hosts this website. We have now started collecting a city bike data set, which should be ready for analysis at a later time!