Processing Milan's telecom, gps and census data with Dataflow&Tableau

Wednesday, September 30, 2015 Florian Goossens

How to process Milan's telecom data, gps data and census data into an interactive dashboard that lets investors discover the best places to start a new business? 

We're using the Google Cloud Platform to store and process data, and connect with Tableau for visualisation.

Everything happens in the cloud. Raw datasets are uploaded to Google Cloud Storage. Google Cloud Dataflow handles the processing. We only have to write the correct algorithms. No worries about infrastructure, it scales automatically. The output data is added to BigQuery and connected to Tableau. New data, for example the latest telecom data, can be uploaded, processed and visualised automatically.


Datasets

This year, the theme for the TIM Big Data Challenge is: how can data help a country grow and become more competitive? TIM and its partners have made a large heterogenous dataset available to all participants.

  • Call Detail Records from TIM


Processing with Cloud Dataflow


1. Extract


First, we examine each dataset and extract every bit of information that could be useful to investors. This step is unique for each dataset. We give an example insight for each dataset.
  • CDR: Where do we see most activity from French tourists on a Saturday? 
  • GPS: Where do people who work in the financial district go for lunch?
  • Company: Which are the shopping zones? 
  • Census: Where do high educated people live?

2. Combine 


Information from each dataset has its own geographical form. An algorithm converts any location information, be it coordinates, different sized squares, or many sided polygons, to a grid system with squares of equal sizes. 


Visualisation with Tableau

Once each square is connected to each dataset, we can build an application to let users discover and compare different locations. 



Results

Next to economic efficiency (the right place for the right business), the dashboard can also be used for urban planning (how much people stay in a park for how long), mobility planning (which home-work routes need a new bus line) and advertisement (who will see a billboard).

With this application, Datatonic has been selected as one of the 10 finalists of the TIM Big Data Challenge, out of more than 700 teams from all over the world, competing with research teams from universities and large companies such as IBM. It shows that a data scientist equipped with Google Cloud Platform and Tableau can easily stay ahead of the competition!


0 comments: