Chapter 7
Mixing Audio

Whether you realize it or not, a lot of your favorite movies and TV shows aren’t recorded in a single consecutive session. A scene in a movie or a TV show may be one clip chosen by the producer from multiple takes. What’s more, they probably aren’t even captured in the same order that you view them. And most likely, sound effects and soundtrack music are mixed in later during the production process. And yet, once it has all been combined together into the final product, it is seamless and coherent.

When creating responses from a skill’s request handlers, the easy thing to do is to just return plain-text for Alexa to speak. Or, if you want to add a little more flavor to the response, SSML can be used. But both of those options are limited on their own. You can’t pick from multiple responses, mix it with background sounds and music, nor easily sequence different sounds and speech into a rich and immersive response.

In this chapter, we’ll look at the Alexa Presentation Language for Audio (commonly referred to as APL-A), a way of combining sound and speech—both plain-text and SSML—into vivid and inviting audio to be played on a user’s Alexa-enabled devices.

