Adjusting Pronunciation

Among the most peculiar quirks of human language, especially in English, is how the same arrangement of letters can form a word that has different meaning or even pronunciation depending on context, locale, or merely the quirks of the person speaking. George and Ira Gershwin wrote a whole song that played on this concept, identifying several pairs of conflicting pronunciations of words like tomato, potato, neither, and pajamas.

Most of the time, Alexa is pretty good about figuring out the correct way of saying a word based on its context. For example, if she were to say, “The storm is getting close,” the “s” in “close” would be a soft “s” sound. However, if she were to say, “You should close the windows,” the “s” would have a harder sound, much like the sound of a “z.”

On the other hand, if she were reciting the Gershwin tune, and said “You like tomato and I like tomato,” then the whole charm of the song would be missed because she would say “tomato” exactly the same both times. In cases like this, a little SSML can help her say words differently than she would normally say them.

Let’s start by looking at how to adjust Alexa’s pronunciation of words based on locale-specific variation.

Using Locale-Specific Pronunciation

It’s common for a word to be pronounced differently due to locale variations. For example, the word “route” is often pronounced differently in the United States than it is in other English-speaking countries. It is usually pronounced to rhyme with “cow” in the U.S. But, it is pronounced with a long “u” sound and sounds like “root” in Great Britain.

Normally, Alexa pronounces words using the locale that the user’s device is configured with. Using the <lang> element in SSML, however, will direct Alexa to speak a word in a specific locale. For example, consider the following SSML example:

 <speak>
  The most direct <lang xml:lang=​"en-US"​>route</lang> is to the left.
  Turn right if you want to take the scenic
  <lang xml:lang=​"en-GB"​>route</lang>.
 </speak>

When Alexa reads this SSML, she will use the U.S. pronunciation in the first use of the word “route” and the British pronunciation when “route” is said the second time.

The <lang> tag isn’t limited to single words and may, in fact, be used to wrap an entire response to apply a locale-specific pronunciation to all words in the response. Also, <lang> may be nested within other <lang> tags to override the outer tag’s pronunciation. For example, the following SSML applies U.S. pronunciation to the entire text, overriding the second use of “tomato” with a British pronunciation:

 <speak>
  <lang xml:lang=​"en-US"​>
  You like tomato. I like <lang xml:lang=​"en-GB"​>tomato</lang>.
  </lang>
 </speak>

In the U.S. pronunciation, “tomato” has a long “a” sound. But, the “a” sounds like the word “awe” in the British pronunciation.

Applying Alternate Pronunciation

Even within a given locale, a word may have different meanings and with each meaning a different pronunciation. For example, consider the following SSML:

 <speak>
  When they complimented her bow she took a bow.
 </speak>

If you were to play this SSML in the text-to-speech simulator, Alexa would pronounce both instances of the word “bow” the same, with a long “o” sound. However, the second “bow” actually rhymes with “cow” so it should be pronounced differently.

To fix this, we can use the <w> tag like this:

 <speak>
  When they complimented her bow she took a
  <w role=​"amazon:SENSE_1"​>bow</w>.
 </speak>

The <w> tag’s role attribute indicates the role of the word in the sentence, as either a noun (amazon:NN), a simple verb (amazon:VB), a past participle (amazon:VBD), or as shown in this example, the non-default sense of the word (amazon:SENSE_1).

Many times, Alexa is able to figure out the correct pronunciation without the <w> tag, based on the context of its use. For example, Alexa will pronounce each use of the word “lead” different in the following SSML:

 <speak>
  They got the lead out and took the lead.
 </speak>

Therefore, before using the <w> tag, you should test your text in the text-to-speech simulator without the tag to see if Alexa will figure it out on her own.

Interpreting Numeric Phrases

Numbers can be especially tricky when it comes to how they should be pronounced. A 10-digit phone number, for example, could be misspoken as a cardinal number:

 <speak>
  For technical support, give us a call at 5555933033.
 </speak>

In this case Alexa will pronounce the phone number as “five billion, five hundred fifty five million, nine hundred thirty three thousand, thirty three.” While that is technically correct, it’s not how phone numbers are typically said out loud. To have her read the phone number as a phone number, you can either introduce dashes between the area code, exchange, and line number or you can use the <say-as> tag to indicate that this is a phone number:

 <speak>
  For technical support, give us a call at
  <say-as interpret-as=​"telephone"​>5555933033</say-as>.
 </speak>

By setting the interpret-as attribute to “telephone”, we direct Alexa to say, “five five five, five nine three, three zero three three.”

In addition to phone numbers, the <say-as> tag offers several other ways to interpret numbers, including the following valid values for the interpret-as attribute:

  • ordinal—Interpret as an ordinal number (for example, “fifty-third”).

  • cardinal—Interpret as a cardinal number (for example, “fifty-three”).

  • digits—Say each digit individually, with no pauses.

  • fraction—Speak the number as a fraction (for example, “one twentieth”).

  • unit—Speak the number and a unit of measure (for example, “12 mg” will be said as “12 milligrams”).

  • date—Speak the number as a date in YYYYMMDD format, or a format specified by the format attribute.

  • time—Speak the number as a measurement of time (for example, 1’20” will be said as “one minute, twenty seconds”).

  • address - Interpret the number as a part of an address.

Setting interpret-as to “date” is especially interesting. Normally, Alexa is really good at interpreting dates without the <say-as> tag when they are presented in a format that suggests that a date. For example, “2019-11-12” and “11-12-2019” will both be interpreted automatically as dates by Alexa and spoken as “November twelfth twenty nineteen.” The phrase “20191112”, however, will be spoken as “twenty million one hundred ninety one thousand one hundred twelve.” Using <say-as> we can coerce Alexa into speaking it as a date:

 <say-as interpret-as=​"date"​>20191112</say-as>

You can also use question marks (?) as placeholders in a date for parts of a date you don’t know or don’t want Alexa to speak. For example, the following SSML snippet causes Alexa to say, “November twenty nineteen”:

 <say-as interpret-as=​"date"​>201911??</say-as>

Even when a date is formatted, it might be ambiguous as to what date format is to be used. For example, consider the following SSML excerpt:

 <say-as interpret-as=​"date"​>1/10/71</say-as>

In this case, Alexa will say, “January tenth nineteen seventy one,” because she will assume that the date format is month-day-year (or “mdy”). But what if the intention was for her to say, “October first nineteen seventy one”? In that case, you an use the format attribute to guide her in saying the date:

 <say-as interpret-as=​"date"​ format=​"dmy"​>1/10/71</say-as>

Any combination of “m”, “d”, and “y” in the format attribute will help her understand the arrangement of date components. You can even specify only one or two components in format to have her speak the partial date:

 <say-as interpret-as=​"date"​ format=​"dm"​>1/10/71</say-as>

Here, Alexa will say “October first” but will not speak the year.

Applying Phonetics

Although tags like <w> and <lang> are helpful in guiding Alexa to speak words with alternate pronunciation, they may still not offer the exact pronunciation we want.

For example, the seventh planet from the sun in our solar system, Uranus, has two commonly used pronunciations—one where the “a” makes the schwa sound (that odd upside-down “e” that makes an “uh” sound) and another where it has a long “a” sound. Although one of those pronunciations may sound a little crude, both are technically correct.[31] By default, Alexa uses the schwa pronunciation of “Uranus” and there’s no obvious way to have her say it the other way. By applying phonetics, however, we can have her pronounce Uranus (or any other word, for that matter) in any way we like.

The <phoneme> tag can be wrapped around words to provide a very specific pronunciation, based on the values given in the alphabet and ph attributes. The alphabet attribute specifies the phonetic alphabet to use and may either be “ipa” (for International Phonetic Alphabet) or “x-sampa” (for the X-SAMPA alphabet). The ph attribute specifies the phonetic spelling of the word.

Applying the <phoneme> to the word “Uranus”, we can specify that Alexa speak either form of the word:

 <speak>
  You say <phoneme alphabet=​"ipa"​ ph=​"ˈjʊərənəs"​>Uranus</phoneme>,
  I say <phoneme alphabet=​"ipa"​ ph=​"jʊˈreɪnəs"​>Uranus</phoneme>
 </speak>

In this case, we’re using the IPA alphabet in both cases. In the first case, she says “Uranus” with the schwa-pronunciation (which is also the default pronunciation). But in the second case, she says it with a long “a” sound.

Alternatively, you could use X-SAMPA spelling to achieve the same effect:

 <speak>
  You say <phoneme alphabet=​"x-sampa"​ ph=​"jUr@n@s"​>Uranus</phoneme>,
  I say <phoneme alphabet=​"x-sampa"​ ph=​"jUrein@s"​>Uranus</phoneme>
 </speak>

Unless you’re a linguistics fanatic, you’ll probably find phonetic spelling and its strange set of characters incredibly difficult to grasp. Fortunately, you won’t likely need to use phonetic spelling often in responses from your Alexa skills. And when you do, you can usually find the IPA spelling on the Wikipedia page for the word you need to spell and just copy-and-paste it into your SSML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset