Update: Some insights from 250M Boris Bike journeys using Tableau

Friday, July 31, 2015 Anonymous

Nowadays the Boris Bikes, also known as Santander or Barclays cycles, are very popular in London. Using the history of 250M bike hire journeys over 3 years (2012-2014), we can gain some insights using Tableau and Google BigQuery.

The illustration above describes the frequency of 2014 Boris Bike rentals. The rows represents an hour, while each column presents a day of the week. The darkness of each cell describes the frequency of rental for that hour during that day. The darker the cell, the more rentals during that part of that day! In the story below, you will find an other type of visualization for this question.

1. Mobility of the users

We start investigating how the users move during the day. Is it really correct that people from the suburbs go to work at the center of London? Click on 'Average distance covered for a start location' in the story above.

After playing with the filter, some things were remarkable. For example, it's eye-catching that during the peak hours in the morning the locations at the suburbs associate with the largest average distances. Furthermore, the largest average distances during the peak hours in the evening are visible at the center of London. This actually meets our expectations!

2.  Balance between size and usage of locations

Every location consists of a certain number of docks. We investigate whether these numbers were well-chosen according to the usage of the locations. Is it useful that non-popular locations contain a lot of docks? Wouldn't it be better to use these docks for popular locations which haven't many docks?

We've only used the most recent data here (2014) because this is a faithful reflection, in order to make other choices or movements in 2015. Clicking on 'Popularity versus the number of docks for each location (2014)' in the story above, you can see the visualization.

3. Efficiency of the new locations

During 2013 and 2014, many new locations were set up. We can now check which locations were useful to introduce as start locations and which were not. In the last part of the story, there are different options to make your analysis more detailed. For example, you can investigate whether the new locations with the largest number of docks were a good idea. Another possibility is to examine the new locations in a certain region. Let's try it out!

Note that 2000 usages or less per year for a location, corresponds to an average of 5 usages a day!

The dataset

The following features were used for every journey (record):
-  "rental_id": identification number of the journey,
-   "bike_id": identification number of the bike,
-   "start_date": date and time at the beginning of the journey,
-   "end_date": date and time at the end of the journey,
-   "end_id": identification number of the location where the journey was ended,
-   "end_location": name of the location where the journey was ended,
-   "start_id": identification number of the location where the journey was started,
-   "start_location": name of the location where the journey was started,
-   "start_lat": latitude of the start location,
-   "start_long": longitude of the start location,
-   "end_lat": latitude of the end location,
-   "end_long": longitude of the end location
-   "start_nDocks": number of docks at the start location,
-   "end_nDocks": number of docks at the end location