Building a Skill for Amazon Echo

One of the talks I saw at this year's PHP conference was about what you can do with the Amazon Echo. I just couldn't resist having a little go myself to create a custom Skill.

Warning this is a long post!

What is a Skill?

A "Skill" is to the Echo what an "App" is to a smartphone.

Skill Aim

I decided to start at the entry level: a simple request and a simple response. At the same time, I wanted it to be a little more useful than "Hello, World". So, how about telling me where the International Space Station is right now? Like I said, starting simple, how hard can it be to tell me where a 400,000 kg object orbiting the Earth is?

Tools

The easiest way to set up a skill is by using a combination of AWS Lambda and the developer console.

Lambda is a service for hosting and running code without having to worry about the underlying server or other things like SSL certificates (a requirement for Amazon Echo).

They have made it almost seamless to make a skill and attach it to the Lambda function.

I am also going to use this open api for getting the location of the ISS at a given time. Example of its output:

 {"message": "success", "iss_position": {"latitude": "-4.3892", "longitude": "173.5036"}, "timestamp": 1488893384}

So it is very straightforward. We just take the latitude and longitude and have Alexa read them back to us.

Let's begin.

Making the Skill

Anyone that has made any kind of app on any platform would expect that some kind of developer account is required. Alas, you have to go to https://developer.amazon.com/login.html, to reach Amazon's.

If you already have an Amazon account, you can just login with this.

Select Alexa from the menu at the top and then select Alexa Skills Kit (which has the acronym ASK: how clever is that!).

Here, you will see a list of Skills that you are currently developing. You can save the skills at any step and come back to them later.

In the top right is the shiny Add New Skill button. Press this and you will be taken to the new screen.

Skill Information

First step is very easy:

First we have the Skill Type. In this case, it's a simple Custom Interaction Model.

Language is obviously English (UK) for me.

Name is simply the name of the app. I decided to call it "I.S.S. Tracker.

Invocation Name is a little more interesting. This is the name that you will say when you say "Alexa, ask app, 'x'". I want it to be "I.S.S. Tracker", such that it is pronounced, "eye ess ess tracker". Abbreviations marked out properly with the .s will be treated like this. If I simply put "ISS Tracker", then I would have to pronounce it "iss tracker".

There is then a box asking about adding an Audo Player. In this case, the answer is very simply "No".

So you can see, that I can "Save" the form at any time. This allows me to close and come back later if I wish. "Next" obviously goes to Step 2, where things start getting interesting.

Interaction Model

Here we specify how the user will interact with our Skill.

The first box is for the Intent Schema.

What is an Intent?

An Intent is simply a piece of functionality that a Skill can do. One Skill can have one or more Intents. An Intent Schema is a JSON document that states what our Intents are.

There is a lot of stuff that an Intent Scheme can have (for example Slots). But we have a very simple example for this post and so our schema is very simple:

{
    "intents": [
        {
            "intent": "GetLocation"
        }
     ]
 }

So as you can see, we have one Intent and it is called GetLocation.

Once you have decided on your Intents, you then have to decide what the user has to say to invoke each one. Amazon calls these "Utterences". You should try and think of as many Utterences as possible to fit every possible combination. Amazon is a little bit clever and will match even more past the ones you specify (you only specify a sample). But the more you can do, the more Amazon can match later.

Here are the utterences that I have specified for this Skill:

GetLocation where is the I.S.S. right now
GetLocation where is the I.S.S.
GetLocation where is the International Space Station right now
GetLocation where is the International Space Station
GetLocation tell me where the International Space Station is
GetLocation tell me where the I.S.S. is
GetLocation tell me where it is
GetLocation where is it

Each Utterence takes a new line. Each one starts with the name of the Intent that it is for and is then followed by what would be said. Things to note:

  • The user won't actually say "GetLocation", this is there just to match it to the Intent. They would say: "Alexa, ask I.S.S. Tracker where is the I.S.S. right now?". So imagine each utterence being preceded with that.
  • Choose your utterences to make it as natural for the user to speak as possible. Again, they could start each of these with "Alexa, tell I.S.S. Tracker" or "Alexa, ask I.S.S. Tracker", so pick for both of these cases. [Amazon have a pretty extensive list of all the ways that phrases could be said, so check that out].(https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/supported-phrases-to-begin-a-conversation)
  • Like the name earlier, abbreviations do not have to be phonetically spelled out as long as you have the dots in the right place.
  • Notice there is no punctuation at this stage.

And that is it. It is really simple. Think about how incredible it is that you have now basically programmed something to respond to certain things that a person says in their normal voice!

Before going to the next step. It is time to build the Lambda function. So you can save the skill at this point and head over to the AWS console and sign up or sign in.

Building the Lambda Function

Before you get concerned about pricing let me tell you that the first 1 million requests per month are free. So it is safe to say that for development purposes, you should be fine and it is basically free to use (after it is $0.20 per million requests, if you are serving this many requests, you probably not worried about 20 cents).

Once you are signed into the console, go to "Services" in the top left and select "Lambda" under "Compute".

Here you will see your functions. Press the "Create a Lambda function" button. Now, you will see some "Blue Prints". Since you can basically copy the code from this blog, you can just go right ahead and select "Blank Function".

Next you will see "Configure Triggers". This is where you specify what will make the function run. Click the empty square and a drop down will appear:

Select the "Alexa Skills Kit", option.

The next page is where you add your code:

You give it a name and select what language your code is written in. In our case, we are using Node.js.

Next is a big box for the code. Just like with Skills, you can come back and edit this code any time, so don't feel pressured to have the code ready at this moment.

But I do have the code, so here it is. I have broken it up to explain bits, but each code part is copy and pastable in the correct order.

'use strict';
var http = require('http');

const Alexa = require('alexa-sdk');
var alexa;

Starts off easy. We include Node's http module for doing our request to the I.S.S. Tracker JSON. We also include the Alexa SDK.

const handlers = {
    'LaunchRequest': function () {
        this.emit('GetLocation');
    },
    'GetLocationIntent': function () {
        this.emit('GetLocation');
    },

Here we are specifying handlers. Each handler is basically a collection of functions. You can have more than one of these (they are registered later). Also, each function within the handler collection can call other functions (in this case GetLocationIntent calls GetLocation). Documentation for all this is available here.

I adapted this code from this example (especially the HTTP call).

As you can see, this is where the code for the Intent is specified. We only have have GetLocation and that calls GetLocationIntent.

this.emit is sort of like a return call. So gives back a result.

    'GetLocation': function () {
        var speechOutput = '';
        var cardOutput = '';

So we have our actual function that is called from with in GetLocationIntent above. Starts off declaring some variables that we need. speechOutput will store what Alexa actually says out loud back to the user. The cardOutput will hold what is stored and printed on the card that appears on the Alexa App, if the user has that.

        httpGet('api.open-notify.org', '/iss-now.json', function (response) {

Here we are calling a function that I have declared at the bottom of the file: httpGet. The response is passed into a callback function that is then executed. Again, this was heavily adapted from this example.

            var responseData = JSON.parse(response);

Standard parsing of the JSON response into a data structure that we can deal with.

            if (responseData === null) {
                speechOutput = 'I have no idea where it is.';
                cardOutput = 'There was an error.'

Very simple error handling. I am lazing so Alexa will feign confusion in the event something goes wrong with this request.

            } else {
                speechOutput = 'I have found it. It is at latitude ' + responseData.iss_position.latitude + ', and longitude ' + responseData.iss_position.longitude;
                cardOutput = responseData.iss_position.latitude + ', ' + responseData.iss_position.longitude;

Now, I am putting together the speech output and the card output. This is where it gets interesting again. Punctuation is very important at this point as Alexa will have different intonations with her voice as she reaches pauses or ends of sentences. Be careful with concatenating strings and make sure you include the spaces in the string.

I have also generated a basic output for the card that appears on the Alexa App. This is optional of course.

        }
            alexa.emit(':tellWithCard', speechOutput, 'I.S.S. Current Co-ordinates - ' + new Date(responseData.timestamp * 1e3).toISOString(), cardOutput);

        });
    },

Finally, I end the function with the return. As it is done within the callback function, I could not use this.emit, as this is not within the context of the callback. Earlier, you saw var alexa and later you will see that I assign alexa with this, thus making this global. I hope that sort of makes sense.

:tellWithCard obviously states that I want to add a card and so I have given the cardOutput as one of the parameters. If I simply had :tell, there would be no card and only speech.

I am passing the timestamp with some very basic conversion to something human readable for the card title.

    'AMAZON.HelpIntent': function () {
        const speechOutput = this.t('HELP_MESSAGE');
        const reprompt = this.t('HELP_MESSAGE');
        this.emit(':ask', speechOutput, reprompt);
    },
    'AMAZON.CancelIntent': function () {
        this.emit(':tell', this.t('STOP_MESSAGE'));
    },
    'AMAZON.StopIntent': function () {
        this.emit(':tell', this.t('STOP_MESSAGE'));
    },
    'SessionEndedRequest': function () {
        this.emit(':tell', this.t('STOP_MESSAGE'));
    },
};

Finally, I had some boiler plate stuff. Not entirely sure what these are doing or if they are necessary since we are not doing anything special. But they are using Amazon magic names. I will leave you to investigate what they are for.

So at this point, we have specified the handler. Now we specify what is essentially the main function.

exports.handler = (event, context) => {
    alexa = Alexa.handler(event, context);
    alexa.registerHandlers(handlers);
    alexa.execute();
};

So we are creating a new Alexa object, specifying the above handlers, and then executing it and letting Alexa work her magic.

function httpGet(location, file, callback) {
    var options = {
        host: location,
        path: file,
        method: 'GET'
    };
    var req = http.request(options, (res) => {
        var body = '';
            res.on('data', (d) => {
            body += d;
        });
        res.on('end', function () {
            callback(body);
        });
    });
    req.end();
 }

Finally we have the httpGet function that we used earlier. Again, this can be found online. It is pretty standard usage of the http node module and should get you started.

Back to the Skill

So after you have put this code in and saved your function, in the top right corner, you should see the ARN number. Copy this and go back to the Amazon Skill setup.

Linking to the Lambda Function

The next step on the Amazon Skill setup is "Configuration". Here, you specify the endpoint that you Skill calls to run its Intents.

Select the AWS Lambda ARN radio button and then paste the ARN number in the box below.

Say No to the account linking section and save.

That's it! Now is the fun part with testing.

Testing with Simulator

So for testing, I have two tabs: one for modifying my Lambda function and one for testing the Skill.

Amazon have helpfully provided a simulator so you do not have to keep asking Alexa the same question for 3 hours whilst you try and tweak it.

You will see an interface like this:

Simply type what you would say in the "Enter Utterance" click Ask, and you will see a response or an error. If you like, you can press the "Listen" button to see exactly how she would say it.

If you have an error or want to tweak, you can have the Lambda function open in another tab and changes you make there will be instantly reflected.

To see logs, click the Test button on the Lambda function. The screen will split into two. On the bottom half, scroll down to "Log output", and find where it says "Click here to view the CloudWatch log group.":

Click that, and you will be taken to where the logs are collated in CloudWatch. Each time you run the Skill from the other tab, logs will appear here:

You will find that the logs are grouped together in batches where each batch is just 5 minutes of logs.

Testing with Echo

You will want to test from your device eventually. Luckily, this is very easy. If you used the same account for setting up Alexa as you used for the developer console, the app will already be available to use for you.

Opening the app and going to the your skills list, you will see it there. No further set up is needed.

So here is a video of me testing my tracker:

As you heard, I said a phrase that I didn't actually directly specify in my samples earlier "find the I.S.S. right now". As I said, the Echo is clever enough to work out variations based on your samples. But still, the more samples you provide the more variations it will work out.

Should also note that in its current form, I know that reading out the co-ordinates like this is probably useful to nobody. But for demo purposes, it will do.

Opening the app, you can see the card that it created as we programmed it to earlier:

We have only done basic text for the cards, but there is huge potential as to what you can display here including images and styling to fit any scenario. You can find out more about them here.

Summary

I hope this guide has been at least good enough to get you started. It is quite hard to explain everything in a single post and what I have done here is incredibly basic. I have missed a lot of things out.

The documentation is very detailed and I found the examples found on their Github very useful. I have also made my skill available here.


© 2012-2023