Learning ElasticSearch
One of the things that has been on the top of my list to learn for a long time is ElasticSearch.
I have finally been able to find the time to get started on this and this little blog post is about how to start off.
I will start off by saying that this article was a massive help in getting me started and a lot of what is here I learnt from that post, so check it out in conjunction with this one.
What is ElasticSearch?
According to Wikipedia, "is a search engine based on Lucene". What is Lucene? Again, Wikipedia says that Lucene "is a free and open-source information retrieval software library".
So, ElasticSearch does searching (I guess you got that much from the name).
One of the things it is very good at, is full textual search. This means you can search documents for a keyword or phrase incredibly quickly with very little effort.
How to try it out on your machine?
It is actually incredibly easy to start testing the power of ElasticSearch on your machine. All you have to do, is go to the ElasticSearch download page and select the compressed folder of your choice. Whatever OS you are on, it will download all of the necessary packages.
One thing you will need to do first however, is install Java.
Windows
Once you have unzipped the packaged files, go to the bin
folder and run elasticsearch.bat
. I found that I can just double click this from Windows Explorer and it opens up the command window automatically. However, you can navigate to the folder in the command window and run it manually if you like.
Mac/Ubuntu
The instructions for OSX and Ubuntu are the same. Open up terminal, go to the bin
folder and run the shell script: ./elasticsearch
. Give the -d
option to run it in the background.
That is it! As you can see it is very easy to quickly get started and get testing/learning. Obviously, as you learn new things and find your needs changing, you can alter this process. But for just starting out, its great!
Client
Once you have your ElasticSearch up and running, you will need a way to send stuff to it.
You can use cURL from the terminal but I think a GUI is easier when you have potentially large JSON files to send.
There are dozens of REST clients out there and especially in Chrome or Firefox app stores, so I will leave you to find your favourite.
The blog article I linked to above, suggested Sense for Google Chrome. This client is specifically designed for ElasticSearch and so does basic syntax checking when you are creating your payloads. I personally found it very nice to work with.
I should mention, that whilst your ElasticSearch server is running, it is accessible via localhost:9200
. This is the base address that all the requests will go to. If simply go to it in your web browser, you should see an output returned (an error as you have not given it anything).
Make Your First Document
With ElasticSearch, you basically store all of the data in "documents". Each document has an index, a type and an ID.
The index is the actual place where the document is contained. There is a whole topic around indexes that could be written about that I am going to skip. For all intents and purposes, an index holds data.
The type is the class that the object represents. For example, if I was storing student information, then the type might be "student".
The id is pretty obviously a unique number or string that identifies a particular document. Think of this as the primary key of tables in relational databases.
So if we were to map these concepts onto relational database concepts, the index is like a whole database, the type is like a table and the id is like the primary key.
To make a document, we use a PUT
request:
PUT {index}/{type}/{id}
It's that simple. If we want the ID to be auto-generated, we simply omit the id from the call and use POST
instead of PUT
:
POST {index}/{type}
As an example, I am going to use data from my website, UniverseSite, let's start with the planet articles.
Each document is written in JSON format, for example, here is a screenshot of the JSON document for Mercury:
Each item is basically like a "field" in relational databases. They are all searchable too.
And when I run it:
Since I am not going into a lot of detail in this post, much of this output will not make sense. But the first three items should.
They tell you the index and type (which is what we gave so that is good) and also the index that it created.
Searching for a Document
OK, now that we have created the document, let's try and find it again!
Obviously, this is not as interest as if we had many more documents, but it is just to get an idea.
First, we can retrieve the document with the id, using a GET
, in this case, it would be:
GET articles/planet/AVsRQofSHK6bK8WfrtqO
Which returns the whole document:
But how about a search? To search, we POST
a query to the search endpoint:
POST _search
This would search every single index. But if I point to /articles/_search
, it would search only the "articles" index. I could also point to /articles/planet/_search
, which would only search for documents with the type "planet" in the "articles" index.
Queries are written in ElasticSearch's own Domain Specific Language (DSL). Again, multiple articles can be written on this, so I will start with just the simple stuff.
Here is an example search:
POST _search
{
"query" : {
"match_phrase" : {
"content" : "mercury"
}
},
"highlight" : {
"fields" : {
"content" : {}
}
}
}
So agan, not going to go into a huge amount of detail here. But we are doing a POST
request to the whole ElasticSearch database.
There are 2 parts here. The first part is the query. I am doing a "match_phrase" which means it will return all of the items that match the phrase that I give (the phrase could be more than 1 word).
I am telling it to match the phrase "mercury" on the "content" field only.
The second part is one of my favourite features. I am telling the search to highlight the "content" field where the search term has been found. So if I wanted to display the results, it would automatically give me an output with the terms highlighted! How cool is that!
So the output of the above search is below:
You can see all of the results are inside hits
. There is a total number of results and each hit
is a result. In this case, there is only 1.
We could have specified which fields to have selected so that not all of them are returned.
You can see highlight
which contains all of the fields we specified. ElasticSearch doesn't give us the entire content, but rather extracts containing the search term, with the term surrounded by <em>
tags. I believe it is possible to modify how long each extract is.
Summary
This post has obviously missed out a lot of stuff. This is hardly surprising giving how much I could potentially cover (the book that I am reading has over 660 pages on it!). To say we have barely scratched the surface is an understatement, I don't even think we have marked the surface at all. But it is a great starting point and I hope to continue playing with it for some time.