Creating a simple Alexa skill

About a year ago I started to look at LEX, you know the thing that powers ALEXa. I started to build a simple chat bot that could respond to simple queries. One thing led to another and I also created a very basic Alexa Skill. After my initial development I put it on hold, there was more things to look at that felt more attractive.
After Re:Invent 2018 I watch a couple of Alexa talks on YouTube and suddenly the interest to build voice enabled applications was reborn.

Alexa Skill Developer Tools: Build Better Skills Faster

Connect any device to Alexa

There was a new SDK and development kit and everything felt refreshed. I started to build a simple Guess the number skill to test the development kit and pick up where I left off. In this blog I will walk you through the creation of this skill and things to think about.

Configure and create the Skill

To start creating a skill we first need to create the skill, it's possible to use a configuration json file and set everything up using CloudFormation. I will however in this case use the Alexa console for creating and testing the skill.
Start by signing in to the Alexa developer console, if you don't already have an account just sign up for one. It's good to remember that if you already own a physical Alexa product, like the echo dot, and sign in with the same account in the developer console as you use for your alexa devices your skills will automatically be available on those devices, and that is great for testing.

When in the console, click the blue "Create Skill" button, give your skill a name and a default language. I will use the name Guess The Number and default language English (US). Select to create a Custom skill. Click create and then select "Start from scratch"

Configure Invocation Name
Now we need to give the Skill an invocation name, click on "Invocation" in the menu and enter a name. This is the name we will use to tell Alexa to open our application,
"Alexa, open guess the number"

Create slot types
So what is a slot then? Slots are the different kind of input to you Skill. For example if you would say "Alexa, what's the weather like in New York" here "New York" would be a Slot probably with the type AMAZON.US_CITY.
There are several slot types available for us to use or we can create our own custom slot. In this example we just use the built in type AMAZON.NUMBER.
So add that by selecting slot types in the menu and clicking Add Slot Type button. Make sure to add an AMAZON.NUMBER slot type.

Create intents
Time to start creating our intents. Intents are the different kind of actions that our skill support, each intent is matched to a voice utterance. So when the user speak to Alexa the service will match the utterance spoken by the user to a intent in our skill.
There are several built-in intents like the AMAZON.YesIntent which would map towards a positive response like, yes, ok, sounds good. Beside the built-in intents we can create our own custom intents, in this case we must supply the service with example utterances that can be used during matching. We don't need to think of all of the utterances but the more we supply the better will the model be. Let us start by creating an Intent that can be matched to the user guessing a number.

Start with the NumberGuessIntent. We will be using the slot that we defined earlier. We start by creating some sample utterances, like could it be {number} here {number} indicates the slot.

Next we add some built-in intents to handle some different scenarios like help, cancel and more. So let us add AMAZON.CancelIntent, AMAZON.HelpIntent, AMAZON.StopIntent, AMAZON.YesIntent, AMAZON.NoIntent, and AMAZON.FallbackIntent

Most of them are self explanatory but the AMAZON.FallbackIntent can be a bit unclear. It helps us handle unexpected utterances, or when a user says something that doesn’t map to any intents in our skill. That way we can ask the user to repeat or any other error message.

Setup the endpoint
Time has come for us to setup the endpoint that the Alexa service will call when our skill is triggered. I will use a lambda function as the endpoint so before we do the actual setup we need to create the function.

Create the Lambda Function
Creating the Lambda function is done the normal way, with CloudFormation. So let us start with creating the function and then add the code we need.
Since we are going to persist attributes the Lambda function need to have permission to access to DynamoDB, so we start by crating the IAM Role we need.

AlexaSkillFunctionRole:
  Type: "AWS::IAM::Role"
  Properties: 
    AssumeRolePolicyDocument:
      Version: 2012-10-17
      Statement: 
        - 
          Effect: Allow
          Principal: 
            Service: 
              - lambda.amazonaws.com
          Action: 
            - sts:AssumeRole
CloudWatchLogsPolicy:
  Type: "AWS::IAM::Policy"
  Properties:
    PolicyName: AlexaGuessTheNumberCloudWatchPolicy
    PolicyDocument: 
      Version: 2012-10-17
      Statement: 
        - 
          Effect: Allow
          Action: 
            - logs:*
          Resource: 'arn:aws:logs:*:*:*'
    Roles: 
      - !Ref AlexaSkillFunctionRole
DynamoDbPolicy:
  Type: "AWS::IAM::Policy"
  Properties:
    PolicyName: AlexaGuessTheNumberDynamoDBPolicy
    PolicyDocument: 
      Version: 2012-10-17
      Statement: 
        - 
          Effect: Allow
          Action: 
            - dynamodb:CreateTable
            - dynamodb:DeleteItem
            - dynamodb:GetItem
            - dynamodb:PutItem
            - dynamodb:UpdateItem
          Resource: !Sub 'arn:aws:dynamodb:${AWS::Region}:${AWS::AccountId}:table/guess-the-number'
    Roles: 
      - !Ref AlexaSkillFunctionRole

With the IAM Role in place we can create the actual Lambda function as well.

AlexaSkillFunction:
  Type: AWS::Serverless::Function
  Properties:
    FunctionName: alexa-skill-guess-the-number
    Runtime: python2.7
    MemorySize: 128
    Timeout: 10
    CodeUri: ./src
    Handler: handler.lambda_handler
    Role: !GetAtt AlexaSkillFunctionRole.Arn
    Layers:
      - !Ref AlexaSdkLayerVersionArn

The Alexa SDK (ask-sdk) is not available by default in Lambda so we need to add it. I'm using layers to include the Ask SDK. Check out my previous post to see how to create a Lambda Layer

The only thing left now is to make sure the Alexa service can call our function. When setting that up we also want to make sure that it's only this Skill that can call us and not every Alexa skill. To do that we need to get our Skill ID, that is found in the endpoint section in the left menu.

When we have the Skill ID we can set up the permissions needed for Alexa service to call our Lambda function.

AlexaAskPermission:
  Type: AWS::Lambda::Permission
  Properties:
    FunctionName: !GetAtt AlexaSkillFunction.Arn
    Action: lambda:InvokeFunction
    Principal: 'alexa-appkit.amazon.com'
    EventSourceToken: !Ref AlexaSkillId

Time to start writing some code and implement our Intents. I will be using Python and when we use Python we can implement the Intents either by using classes or decorators. Here I will be using classes all the way. We need to implement to functions for each intent, can_handle and handle.
The can_handle function will determine if this class can handle the actual intent being triggered. The handle function would then be called if can_hande return true.

We start by implementing the LaunchRequestHandler, this will handle us saying Alexa open Guess the number

class LaunchRequestHandler(AbstractRequestHandler):

  def can_handle(self, handler_input):
      return is_request_type("LaunchRequest")(handler_input)

  def handle(self, handler_input):
      speech = "Welcome to guess the number. Would you like to play?"
      reprompt = "Do you want to play?"
      
      handler_input.response_builder.speak(speech).ask(reprompt)
      return handler_input.response_builder.response

In the code above the can_handle function will check if the intent is the LaunchRequest and return true if it is.
Our handle function would then be triggered. By adding a repromt message the session will be kept open and if we don't answer Alexa she will use the repromt message to ask for an answer again.
It would also be possible to keep the session open using the set_should_end_session function when building the response.

Now to the handler for the Yes intent, in this handler we will create a random number and store it in the session attributes.

class YesIntentHandler(AbstractRequestHandler):

  def can_handle(self, handler_input):
      return is_intent_name("AMAZON.YesIntent")(handler_input)

  def handle(self, handler_input):
      session_attr = handler_input.attributes_manager.session_attributes

      handler_input.attributes_manager.session_attributes["correct_number"] = random.randint(0, 11)
      reprompt = "Try saying a number."
      speech = 'Guess a number between 0 and 10.'
      handler_input.response_builder.speak(speech).ask(reprompt)
      return handler_input.response_builder.response

Here we will check if it's the built in YesIntent that is triggered and if it is then we handle it. The handle function will generate a random number and store it in the session attributes for us to use later. Once again we use a repromt phrase to keep the session open.

OK so now we can open the skill and we'll generate a random number for the user to guess when the user says he/she like to play. Now let us implement our custom NumberGuessIntent.

class NumberGuessIntentHandler(AbstractRequestHandler):
  def can_handle(self, handler_input):
      return is_intent_name("NumberGuessIntent")(handler_input)


  def handle(self, handler_input):
    correct_number = handler_input.attributes_manager.session_attributes["correct_number"]

    filled_slots = handler_input.request_envelope.request.intent.slots
    number_slot_value = get_slot_value(filled_slots, 'number')
    guessed_number = int(number_slot_value)
    

    if guessed_number == correct_number:
        speech = 'Correct! Congratulations! Would you like to play again?'
        reprompt = "Do you want to play again?"
        handler_input.response_builder.speak(speech).ask(reprompt)
    elif guessed_number > correct_number:
        speech = '{} is to high, try again.'.format(guessed_number)
        reprompt = "Try saying a number"
        handler_input.response_builder.speak(speech).ask(reprompt)
    elif guessed_number < correct_number:
        speech = '{} is to low, try again.'.format(guessed_number)
        reprompt = "Try saying a number"
        handler_input.response_builder.speak(speech).ask(reprompt)

    return handler_input.response_builder.response


  def get_slot_value(filled_slots, slot_key):
    for key in filled_slots:
        logger.debug(key)
        if key == slot_key:
            return filled_slots[key].to_dict()['value']
            # return filled_slots[key]['value']
    return None

Let's check the code for this handler. We check that the intent is the NumberGuessIntent in the can_handle function.

Let's go through the handle function briefly to see what we do in that. We start be reading the correct, the generated random number, from the sessions attributes, remember we generated this and stored it in the YesIntentHandler.
After that we get the filles_slots from the handler_input, we created a slot that we named number when we created the intent in the Alexa Skill console. I have created a helper function get_slot_value that loops through the slots and fetch the value.
When we have the correct and guessed number we can create the phrase we like Alexa to speak. If the user guess the correct number we congratulate and asks if he/she like to play again.

We then have to repeat everything and implement and handle the rest of the intents, we specified when creating the Skill. The logic for it follows the same pattern so I will not show that here. What we however must do is to add the intent handlers to a SkillBuilder class and expose the lambda handler. Otherwise the call to the Lambda function will fail. So let's go ahead and do just that.

# Skill Builder object
sb = StandardSkillBuilder(
    table_name="guess-the-number", auto_create_table=True)

# Add all request handlers to the skill.
sb.add_request_handler(LaunchRequestHandler())
sb.add_request_handler(YesIntentHandler())
sb.add_request_handler(NumberGuessIntentHandler())

# Expose the lambda handler to register in AWS Lambda.
lambda_handler = sb.lambda_handler()

So now we can deploy the Lambda function and then head back to the Alexa console and finish creating our endpoint setup.

Finish setup of the endpoint
So with the Lambda function ready we can now finish the setup of the endpoint.
We navigate back to the Alexa console and select endpoint in the menu to the left. We are going to use AWS Lambda endpoint type, which is the recommended type as well. We are only going to use one default Lambda function and not create functions in multiple regions. If this was a world wide Skill up for release I would deploy the function in multiple regions to make sure the users get as good experience as possible with low latency access. Copy the ARN from the function we just deployed and paste it in the default Lambda input.

Basically everything we need to start testing our skill is now in place, we just have to make sure we have saved and built the model. So we do that now, just press the save and build buttons that are available in the top.
What we also could have done is to setup ui for Alexa devices with screens. But for a very simple first Skill this would be sufficient.

Testing time
To start testing our skill switch to the test tab. There are three sections in the testing area, we have the "Alexa Simulator", "Manual JSON", and "Voice & Tone". In the Voice & Tone section we can try out SSML and see how it will sound. This is a fast way to try out your changes to voice, speed, tone, breaks, and other SSML tags.
What we will focus on is the Alexa Simulator, here we can type or speak phrases and see how our Skill reacts. To open our skill just type or say "Alexa open 'invocation name'" in my case it would be "Alexa open guess the number". Alexa will the respond with both text and speech to our input. So in this mode we can test back and forth and see what happens. In the JSON output section we can also see our sessionAttributes and how they change and update.

Conclusion
We have now built our very first very simple Alexa skill. For me there will definitely be more of them, I might even publish one or two.

As Werner Vogels would say, "Now go build"