Learning ElasticSearch with Kibana

I recently hit 1000 Tweets on my Twitter account.

I decide to do some analytics on those 1000 Tweets to whilst at the same time getting some practice using ElasticSearch and Kibana to do it.

If you are looking for a proper "how-to" guide, you have come to the wrong place I am afraid. This is just a personal account of my experience.

Step 1 - Load Tweets Into ElasticSearch

The first thing that I have done is retrieved an export of my Tweets. This is done on Twitter, using the exporter tool that can be found on the account settings.

This give a CSV file of every single Tweet (I believe there is a limit of 3000 Tweets).

The next step is to write a simple loop to open the file and convert every Tweet into an ElasticSearch record.

They have a rather handy SDK that can be used to interact with your ElasticSearch instance. This can be installed using composer by adding the following to your composer.json file:

{
    "require": {
        "elasticsearch/elasticsearch": "~5.0"
    }
}

Now, I write some code:

$elasticRecords = [];
if (($handle = fopen("tweets.csv", "r")) !== FALSE) {
    while (($tweetData = fgetcsv($handle, 0, ",")) !== FALSE) {
        $elasticRecordArray = [
            'id' => trim($tweetData[0], '"'),
            'timestamp' => trim($tweetData[3], '"'),
            'text' => trim($tweetData[5], '"')
        ];
        $elasticRecords[] = $elasticRecordArray;
    }
    fclose($handle);
}

So we have to use fgetcsv. The reason is because if I tried to just explode the string normally, new lines within Tweets are not handled correctly and you end up with the wrong results.

I am simply looping through each line, grabbing what I need and adding it to an array.

As I said, I have more than 1000 Tweets now. The CSV puts the most recent at the top of the file. So to get there first 1000, I reverse the array and split it:

$elasticRecords = array_reverse($elasticRecords);
$elasticRecords = array_slice($elasticRecords, 0, 1000);

I now have an array of records ready for ElasticSearch.

As I used composer to install the library, I can simply include the autoload function:

 require 'vendor/autoload.php';

Then I create the client:

 $client = Elasticsearch\ClientBuilder::create()->build();

build takes an array of settings as a parameter. Since I am using the default settings for literally everything, I do not need to add any of my own (see last weeks post on how to start up a local ElasticSearch instance with the default settings).

So just to show that I have nothing in my index before I run this script:

As you can see, I am calling the index, "tweets" and each object "tweet". The query has returned that there is no results and that actually, the index doesn't even exist.

Now we simply loop through the records that I made and pass them into the index. They key of each record becomes the field:

foreach ($elasticRecords as $record) {
    $params = [
        'index' => 'tweets',
        'type' => 'tweet',
        'id' => $record['id'],
        'body' => $record
    ];
    $response = $client->index($params);
}

Now if I run the same query again:

As you can see, it now finds the index and shows that there are 1000 results.

I should note that this is probably not the most optimised way of doing this. But for a quick and dirty way of putting a relative small amount of data in, this is good enough.

Step 2 - Download and Setup Kibana

So you can download the appropriate package for your operating system here.

There are some configurations that have to be done to link it to the ElasticSearch but if just used the default settings, it should work out of the box.

Once you have started the ElasticSearch and you have started the Kibana service, you can access it in the web browser localhost:5601.

Step 3 - Basic Pie Chart

Now for a very basic visualisation. The "Visualize" item on the menu looks like a good place to start. I click the "Create" button:

And I select "Pie Chart" from the large list:

Next, I select the indext that I wish to to use. As you can see, it has correctly seen that I have one called "tweets".

So this is the initial output:

Not very interesting right now.

I will be honest, I found this a lot harder than I thought it was going to be to play with.

To start, I manage to split the chart to have two filters: tweets that mention @GNRailUK and ones that do not. Why? Any followers of mine will know that I use Twitter to complain to them about the trains A LOT.

So first I make a new "bucket" which has an aggregation type "filter". Then I can add patterns to each filter, e.g. NOT @GNRailUK for tweets that do not have that term in it.

One thing that I couldn't figure out is how to specify the field. This search seems to be global across the record, it is a lucky case that I only have one field with text content.

The output is like this:

As you can see, just over a quarter of all my tweets were directed to them!

You can also change the colour of each filter and add labels.

Conclusion

So ultimately, I find Kibana to be a little trickier than I expected. Perhaps there is a way to do the level of analysis that I have in mind but it just takes time to figure out the right way of doing it.

What I was able to do, I could have done in Excel.

I think that I am going to move away from the analysis side of things and move more towards the search. Specifically, I am interesting in using their SDK that I used earlier to see what I can do with that. Stay tuned!


© 2012-2017