Writing Responses with Markdown

Speech Markdown[35] is a community-led project that makes it possible to write your speech responses using a variation of the popular Markdown[36] text-processing tool.

As a simple example of Speech Markdown’s capabilities, consider the following SSML that has an interjection:

 <speak>
  <say-as interpret-as=​"interjection"​>Great Scott!</say-as>
 </speak>

Using Speech Markdown, this same response could be written like this:

 (Great Scott!)[interjection]

While the SSML document may have been a bit more clear in purpose, the Speech Markdown snippet is a clear winner when it comes to brevity.

Aside from interjections, Speech Markdown offers full parity with all of SSML, including Alexa-specific tags such as <amazon:effect> and <amazon:emotion>. The Speech Markdown reference documentation[37] describes Speech Markdown in full detail. Even so, let’s take a moment to explore some of the essentials of working with Speech Markdown.

Speech Markdown Essentials

Modifying speech with Speech Markdown involves using elements that are either in standard format or short format. For example, suppose that you want to apply emphasis to a word in a sentence. Rather than use the SSML’s <emphasis> tag, you can apply the Speech Markdown modifier like this:

 This is (very)[emphasis:"strong"] important.

This is an example a Speech Markdown standard format element. Standard format elements take the form of some text in parenthesis, followed by one or more modifiers in square braces.

Some (but not all) modifiers have a corresponding short format. The emphasis modifier has a handful of short formats, depending on the strength of the emphasis desired. Strong emphasis, for example, uses two plus-signs on each side of the emphasized text like this:

 This is ++very++ important.

Moderate emphasis uses only a single plus-sign on each side of the text:

 This is +somewhat+ important.

On the other end of the spectrum, a minus-sign on each side of the text indicates reduced emphasis:

 This -isn't very- important.

Another example of the short format is when applying breaks in the text. In SSML, you’d use the <break> tag to add a pause in the spoken text. Using SSML, you can use the break modifier to achieve the same thing:

 I'll give you a moment to decide. [break:"10s"] What is your decision?

But the break element also offers a short format in which you can specify the pause time without the break keyword:

 I'll give you a moment to decide. [10s] What is your decision?

But again, not all Speech Markdown elements have a short format. Nevertheless, all of Speech Markdown’s elements are more concise than their SSML counterparts. And many of the elements can be combined in a single modifier block. For example, the following Speech Markdown shows how to combine the pitch, rate, and volume elements:

 Welcome to (Star Port 75)[pitch:"x-low";rate:"x-slow";volume:"x-loud"] Travel.

It’s important to understand, however, that although most of Speech Markdown’s elements can be combined like this, the SSML they produce may not be supported by Alexa. The volume and emphasis elements, for example, can be combined in Speech Markdown like this:

 Welcome to (Star Port 75)[volume:"x-low";emphasis:"strong"] Travel.

But the SSML that this Speech Markdown translates to is not supported by Alexa.

One other thing you can do with Speech Markdown modifiers is to apply them to a longer section of text, rather than a short excerpt. This is done by applying the modifier to a Markdown section marker (#) like this:

 #[voice:"Joanna";newscaster]
 This just in: A local resident reported that he was frightened by a
 mysterious bright light shining through the trees behind his home.
 Officers responded, relying on their extensive training, and
 determined that the offending light source was not an alien spacecraft
 as originally suspected, but was, in fact, the earth's moon.
 
 #[voice:"Matthew"]
 That was an interesting bit of news

Here, the voice and newscaster elements are used to enable Joanna’s news voice. That setting remains in effect until the next section changes it to Matthew’s regular voice. It translates to the following SSML:

 <speak>
  <amazon:domain name=​"news"​>
  <voice name=​"Joanna"​>
  This just in: A local resident reported that he was frightened by a
  mysterious bright light shining through the trees behind his home.
  Officers responded, relying on their extensive training, and
  determined that the offending light source was not an alien spacecraft
  as originally suspected, but was, in fact, the earth's moon.
  </voice>
  </amazon:domain>
  <voice name=​"Matthew"​>
  That was an interesting bit of news
  </voice>
 </speak>

With the basics of Speech Markdown out of the way, let’s see how to apply it in the Star Port 75 skill.

Adding Speech Markdown to a Skill Project

Speech Markdown comes in the form of a JavaScript library. Much like any other library, adding Speech Markdown to your skill project starts by adding the library to the skill project:

 $ ​​npm​​ ​​install​​ ​​--prefix​​ ​​lambda​​ ​​speechmarkdown-js

This adds the library dependency to the skill’s package.json file in the lambda directory. With the library in place, you can now use require() to bring it into the JavaScript code like this:

 const​ smd = require(​'speechmarkdown-js'​);

Here, Speech Markdown is being assigned to the smd constant, from which it can be used anywhere you need to convert Speech Markdown to SSML. For instance, the following snippet shows how the library is used:

 const​ speechMarkdown = ​new​ smd.SpeechMarkdown({platform: ​'amazon-alexa'​});
 const​ ssml = speechMarkdown.toSSML(​"(Hello)[volume:'x-soft'] ++World++!"​)

This creates a new instance of the Speech Markdown library targeting Alexa. From there, it calls toSSML(), passing in some Speech Markdown text, to generate the equivalent SSML.

You might be wondering about the platform property passed into the SpeechMarkdown constructor. Speech Markdown can be used to create SSML for other voice assistants. Although this book is focused on Alexa skill development, other voice assistants like Google Assistant and Samsung Bixby also support some variant of SSML and can take advantage of Speech Markdown.

When creating the SpeechMarkdown instance, you can set the platform property to specify the platform that SSML should be generated for. The valid options are “amazon-alexa”, “google-assistant”, “samsung-bixby”, and “microsoft-azure”. Speech Markdown will do its best to generate SSML fitting for the target platform, approximating tags that may not be supported on all platforms, or simply not applying any tags in many cases.

For example, given the following Speech Markdown text…

 (Be very very quiet)[whisper]

…then the SSML emitted will be as follows when the target platform is “amazon-alexa”:

 <speak>
 <amazon​:​effect​ ​name=​​"whispered"​​>Be​ ​very​ ​very​ ​quiet</amazon​:​effect>
 </speak>

On the other hand, if platform is set to “google-assistant” or “samsung-bixby”, then the resulting SSML looks like this:

 <speak>
 <prosody​ ​volume=​​"x-soft"​ ​rate=​​"slow"​​>Be​ ​very​ ​very​ ​quiet</prosody>
 </speak>

That’s because neither Google Assistant nor Samsung Bixby support Alexa’s <amazon:effect> tag. Speech Markdown instead produces SSML using the <prosody> tag to approximate the whisper effect.

You can also call the toText() function on the Speech Markdown instance to strip away all Speech Markdown and produce a plain-text response:

 // results in "Hello World!"
 const​ text = speechMarkdown.toText(​"(Hello)[volume:'x-soft'] ++World++!"​)

This can be useful when you are using Speech Markdown to produce SSML responses for Alexa to speak, but also need a non-SSML response when populating text on visual elements such as cards and Alexa Presentation Language documents (which we’ll cover in Chapter 9, Complementing Responses with Cards and Chapter 10, Creating Visual Responses).

While you could remember to call speechMarkdown.toSSML() in every one of your skill’s request handlers, it’s a lot easier to set up an interceptor that will do that for you. We already have a LocalisationRequestInterceptor that we added in Chapter 1, Alexa, Hello to enable us to work with response text from the languageStrings module using a t() function. The following modified version of that interceptor processes the text using Speech Markdown:

 const​ LocalisationRequestInterceptor = {
  process(handlerInput) {
  i18n.init({
  lng: Alexa.getLocale(handlerInput.requestEnvelope),
  resources: languageStrings
  }).then((t) => {
» handlerInput.t = (...args) => {
»const​ speech =
»new​ smd.SpeechMarkdown({platform: ​'amazon-alexa'​});
»return​ speech.toSSML(t(...args));
» };
  });
  }
 };

Now, all you need to do is modify the languageStrings module to use Speech Markdown instead of SSML. For example, the following excerpt from the languageStrings module shows how you might use Speech Markdown in the response for a launch request:

 WELCOME_MSG: ​"!['https://starport75.dev/audio/SP75.mp3'] "​ +
 'Welcome to Star Port 75 Travel, your source for '​ +
 "!['soundbank://soundlibrary/scifi/amzn_sfx_scifi_small_zoom_flyby_01'] "​ +
 'out-of-this-world adventures. '​ +
 'Where do you want to blast off to?'​,

Aside from replacing the previously more verbose SSML response that used SSML’s <audio> tag to add sound effects to a response, the response given here uses the more succinct Speech Markdown ![] format to specify sound effects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset