A Social media stack - Google / Datasift / Tableau

Thursday, August 28, 2014 Michael Dellagana



In the past month or so we were presented with a challenge; we were asked to create a stack that would provide the ability to obtain specific social media datasets and derive meaningful insights from them. We wanted the ability to collect data from all the major platforms (Facebook, Twitter, and Tumblr, etc), aggregate the data and present it an intuitive dashboard.

All of this was to be presented as a use case at the Google Atmosphere event in London’s West End on the 23rd September 2014.

Technologies Used

  • Datasift platform
  • Google Big Query
  • Tableau

Here’s a brief tutorial on how this was accomplished:

Datasift

Familiarising yourself with the main components of the Datasift platform is the first step as they all require some configuration and its probably advisable to do so in this order:

  • Data sources
  • Data Destinations
  • Streams
  • Tasks
'Data sources'
Inevitably this is where our social data will be coming from. You’ll need to “activate” the platforms you’d like to use as by default only a handful of the larger ones are selected.

'Data Destinations'
Here we enable and set up various storage technologies or protocols to transfer and subsequently store our data. There's a wide variety; including Google Big Query. MySQL and a 'Pull connector' which is designed for use with their REST API to retrieve the data on your own terms.

Here’s a brief Google Big Query data destination example…
  1. Click the orange ‘+’ to add the Google Big Query connector
  2. Fill out necessary parameters below…

Label – Simply whatever you’d like to call this data destination (i.e.‘my_new_bigquery_desintation’)
Dataset ID – The name of the BigQuery 'dataset' where you’d like to place your data. This can be found on the left hand side of your Big Query web console.
Project ID – This can be found at the top of your Google Developers Console page. Be sure to use the numeric 'Project Number'.
Table ID – Whatever table name you’d like your data to sit in. The table doesn't have to already exist as Datasift will create a new table for you so you have free reign.
Client ID – As with the Project ID this can be found on the Google Developers Console, in the 'Credentials' section under 'APIs & Auth'. Be sure to make this the 'Service Account' 'Client ID'.
Service account email address – This can be found directly below the 'Client ID'.
P12 Key – If you don’t have this saved you can generate a new one by clicking 'Generate new p12 key' in the same place you found the client id and service email address.

'Streams'

Datasift streams are where you are able to query the Datasift platform and also where you set about extracting it from the networks, you can do this in one of two ways; you can either use their GUI that allows you to create streams by targeting certain keywords, twitter handles, mentions, Facebook pages etc. OR you can use their Curated Stream Definition Language (CSDL).We opted for the CSDL as invariably this provided us with far more control and flexibility to filter out all of the noise that you would normally associate with extracting social media data. A full guide to CSDL can be found here




'Tasks'

Tasks show what you currently have running, the status of these tasks and the amount of interactions they currently have.

Google Big Query


Now that we have our data streaming into Big Query we’re able to start analysing it instantly. You’ll notice data being imported every minute or so in large chunks, a simple refresh of the BigQuery table or whichever tool you have connected to BigQuery should make this pretty apparent.

Tableau


Connecting Tableau to your Big Query table has been made even easier in their latest offering.
Simply click “Connect to data” and you’re presented with countless data connectors. Click the Google Big Query option and all you’ll need is your standard Google account credentials. Once you’ve entered these its simply a case of selecting the correct project, dataset and finally the relevant table and you’re free to start creating your visualisation.

Hopefully this tutorial goes some way to demonstrating how you’re able to set up a robust and reliable social media stack with minimal of technical knowledge.