Going Live With ElasticSearch

A couple of weeks back, I wrote about a post about getting started with ElasticSearch including how to test it on a local machine. This past week, I have been able to finally add search to my website, UniverseSite, something that I have wanted to do since I launched it in 2011.

Developing the actual indexing and searching on my local instance was straightforward. The difficulty came when deploying it live. This post is not going to be a step-by-step tutorial, but just talking about different aspects of the project and how I coded it.

Debugging ElasticSearch on Local and an Evil Gotcha

One of the earliest errors that I was getting on my local machine (and one that took the longest to fix, which is annoying is this:

"No alive nodes found in your cluster"

This is thrown as an Exception and was tricky to debug.

It seems pretty obvious what the problem is: the ElasticSearch server is down. This was what a lot of people were saying online when searching for this error.

But I could see that it was up. I could also interact with it via the the Sense Chrome plugin. This error was only occurring with the Elastic-PHP API. This literally took hours of trying to debug why the problem as occurring including trying every combination of port and location name and checking and double checking for typos.

Then I found a massive helper. In the request that you send to ElasticSearch, you can add the following:

$params['client']['verbose'] = true;

When you run the request, you will get a whole load of other information when you print the response. After doing this, the problem became very very obvious.

This is what I had for the parameters for the host:

 $hosts = [
     [
         'host' => localhost,
         'port' => 9200,
         'scheme' => '',
         'user => '',
         'pass' => ''  
     ]
  ];

Scheme, user and pass are optional parameters. If they weren't included here, there would be no problem (for scheme, "http" would be used). I included them here with empty placeholders as I thought I might need them later so thought there was no harm here.

But there was harm. Because of the empty string being given to scheme, it was completely screwing up the resolved location of the server which was visible in the verbose response (it was making it something like "http://://localhost:9200", although I didn't record the exact message as I was keen to fix).

Getting rid of scheme or setting it to be "http", fixed the above exception. I would argue that this could be a bug in the API. Why is empty string being treated any different as null or not being set?

So this error cost me way too much time compared to what the actual fix was: not having an unnecessary placeholder until it is actually needed.

Setting up AWS ElasticSearch and Connecting with Elastic-PHP

The next part I found very difficult was connecting to the AWS service.

So setting up things on the AWS side was actually straightforward. I won't go into detail here, but first you make a user.

This is done by going to the account menu in the top right, selecting "My Security Credentials" and then "Users" on the left. Add a new user
and give them the type "Programmatic access" and the permission "AmazonESFullAccess". This will give a Access Key and a Secret Key that you should make a copy of at this point.

Once you made a user, go to their ElasticSearch service via the menu and it is quite simple to follow the steps to make a new instance. You will be asked to link the created user to allow access to just them. I created a micro instance to take advantage of the free tier and I don't expect much traffic.

Easy bit done. Now came to the point where you have to tell the Elasticsearch-PHP API to point to this new service and give them the correct credentials.

This was not so straightforward to figure out as AWS expect you to sign the request with the keys of the user. But Elasticsearch-PHP abstracts the request from you, so it is not so easy. But there is a way.

First, the host setup:

 $hosts = [
     [
         'host' => 'The endpoint from AWS',
         'port' => 443,
         'scheme' => 'https'
     ]
  ];

So the host location can be found on the AWS console after you set up the instance. The port is "443" and scheme is "https" which is different from the defaults on the local machine.

If you create your client with this, you would do something like:

$client = \Elasticsearch\ClientBuilder::create()->setHosts($hosts);

Now for the authentication part. AWS haven't produced an API for ElasticSearch because there are so many out that support this request signing. As far as I could see, Elasticsearch-PHP does not provide an easy way for doing this.

It took me a while to find a solution but I did find one in a comment deep on someone's thread (which I can not find now). I will be honest, I didn't fully understand how or why it worked.

After some further research for this post, I found a plugin that works with Elasticsearch-PHP and basically abstracts away the code that I had found. It can be found here. The README has all of the details of how to use it. I suspect that you will need to AWS SDK installed in your project before.

I would recommend using this rather than me copying and pasting code here that I can not fully explain. But the plugin I have linked does appear to work in pretty much the exact same way. So take a look at that (the README has code examples that are easy to follow) and let me know how it goes. It certainly looks a lot easier than my solution.

My Search Queries

Once I was connected to live ElasticSearch service, all I had to do was write my queries and run them.

My query is very very straightforward. If you read my previous post about the basics of ElasticSearch, you will see that it is not much further from that:

$params = [
    'index' => 'universesite',
    'body' => '{
        "query" : {
            "bool" : {
                "should" : [
                    {"match": {"article_name": "'.$searchTerm.'"}},
                    {"match": {"article_content": "'.$searchTerm.'"}},
                    {"match": {"latinName": "'.$searchTerm.'"}},
                    {"match": {"englishName": "'.$searchTerm.'"}},
                    {"match": {"landmarks": "'.$searchTerm.'"}},
                    {"match": {"facts": "'.$searchTerm.'"}}
                ]
            }
        },
        "highlight": {
            "fields": {
                "article_content": {},
                "landmarks": {},
                "facts": {} 
        }
    }
];
$results = $client->search($params);

So there are a few points of interest.

First, you can see that I am setting the body as JSON string. Elasticsearch-PHP allows you to do this or provide an array of arrays that will translate into the correct JSON. I preferred doing it direct for the more complex queries, as I was testing it directly on the Sense plugin prior to copying it over to my code.

The next thing of interest is that, whilst I have set the index that I want to search, I have not set the type. This is because I have 2 types in my index and I want to search them both - a perfectly acceptable thing to do.

The fields in each type are different. But you can see that I did not have to specify anything special when listing the fields that should be "matched". ElasticSearch is clever enough to just ignore a field if it is not in a particular type.

The query itself is a bool query and uses should for the matches. The way that I interpret this is that should is kind of like or and a document can match one or more of those conditions.

ElasticSearch gives scores to results. So documents that match more of the conditions will rank hire. If, for example, if I search for "jupiter", the article that has "jupiter" in the title as well as in the content, will rank hire than if it was just in the content.

As I explained last time. "Highlight" returns lines of the document that matched the particular search term. I display these in the results.

Summary

You can see my final implementation of this, here on the website. Like I said, I haven't gone into loads of details about implementation specifics, just some things that I found tricky and how I solved them. I hope you found this useful if you encountered similar issues. Thanks for reading!


© 2012-2017