Dissecting Skills

Skills are composed of two essential elements: the fulfillment implementation, which is code that provides the skill’s behavior, and the interaction model that describes how a human may interact with the skill.

The interaction model’s main purpose is to assist Alexa’s NLP in understanding what is said and where requests should be sent. It defines three things:

  • An invocation name: The name by which a skill may be launched

  • Utterances: Sample questions, commands, or phrases that a user might say to the skill

  • Intents: Unique and precise commands that the user intends for Alexa to perform, mapped to one or more utterances

The invocation name acts as the trigger that launches a skill so that a user may talk to it. It’s roughly the voice equivalent of clicking an icon to launch a desktop application. Although Alexa has many built-in capabilities that do not require an invocation name, custom skills usually require a unique invocation name so that Alexa can know for sure which skill should receive the request. For example, suppose that a space travel planning skill has an invocation name of “star port seventy five,” the user might say, “open star port seventy five,” or “ask star port seventy five to plan a trip.” If there were another skill that also provided travel planning, then the unique invocation name will ensure that the request is sent to the skill that the user had in mind.

Once a conversation with a skill is started, the user may ask a question or state a command. The funny thing about humans, though, is that we often find many diverse ways to say the same thing. In a space travel planning skill, for example, someone might say, “let’s plan a trip,” but another user might say, “I want to plan a vacation.” Still someone else might say, “schedule a tour.” These are all examples of utterances, each capturing a different way of saying what might mean the same thing.

Regardless of how the user stated their question or command, intents narrow a skill’s utterances into one or more discrete operations. Although simple skills may only do one thing or answer one type of question, many skills are capable of performing many related tasks. For the space travel planning skill, you might plan a trip, ask for information about a destination, or ask when your trip starts. There may be several utterances for each of these, but ultimately these are three distinct tasks. For each task, there will be a unique intent to capture what the skill is expected to do in response to an utterance.

To understand the relationship between utterances and intents, consider the diagram that illustrates the interaction model for the space travel skill.

images/hello/utterance_intent_fulfillment.png

 

As shown here, the space travel skill can do three things: plan a trip, answer when a trip starts, and provide information about a destination. Those three individual actions are captured as the three intents in the middle column, each mapped to a couple of utterances in the leftmost column.

Regardless of what utterance a user says or which intent is chosen, all requests are ultimately sent to the fulfillment implementation for handling. As mentioned earlier, skill fulfillment is commonly implemented as a function deployed to AWS Lambda. Functions, by definition, take input, perform a single operation, and then return the result of that operation. But if the function only performs a single operation, how can a skill’s fulfillment function handle requests for several intents? The answer comes from intent routing.

Once the Lambda function accepts a request, it will internally route the request to different code branches based on the request’s content, including the request’s intent. This is the same as the Front Controller Pattern[3] employed in many web frameworks for routing requests through a single front-end handler to distinct command objects. The only difference is that instead of routing web requests to command objects, the Lambda function routes intent requests to distinct intent handlers.

Put simply, the Lambda function that implements a skill’s functionality appears to do one thing from an outside point of view: Process intent requests from Alexa. Internally, however, it may do many things, responding to many different (but related) questions or commands spoken by the user.

In a nutshell, the Alexa skill can be described as follows:

  • Skills are applications that expand Alexa’s native capability to provide custom behavior.

  • A skill has one fulfillment implementation, typically deployed as an AWS Lambda function.

  • A skill may have one or many related operations, each discretely defined by an intent.

  • Users of a skill may say what they intend for the skill to do in very different ways. Utterances capture different ways a user may interact with a skill and are mapped to more precise intents.

Developing Alexa skills involves defining the interaction model and fulfillment implementation to create custom capabilities for Alexa. Making that possible, Amazon provides the Alexa Skills Kit (ASK), a set of APIs, SDKs, and tools for developing Alexa skills. We’ll soon see how to use ASK’s JavaScript SDK to create our first Alexa skill. But before we go heads-down in developing a skill, let’s install the ASK CLI, one of the most useful tools you’ll put in your Alexa toolbox.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset