4. Overview of Linux Sound Architecture

Jan Newmarch¹

(1)Oakleigh, Victoria, Australia

The Linux sound system, like most of Linux, has evolved from a simple system to a much more complex one. This chapter gives a high-level overview of the components of the Linux sound system and which bits are best used for which use cases.

Resources

Here are some resources :

A Guide Through The Linux Sound API Jungle by Lennart Poettering ( http://0pointer.de/blog/projects/guide-to-sound-apis.html ).
“How it works: Linux audio explained” by TuxRadar ( http://tuxradar.com/content/how-it-works-linux-audio-explained ).
Insane Coder posted an article in favor of OSSv4 State of sound in Linux not so sorry after all ( http://insanecoding.blogspot.com.au/2009/06/state-of-sound-in-linux-not-so-sorry.html ), which drew a lot of comments.

Components

Figure 4-1 indicates the different layers of the Linux sound system.

Figure 4-1. Layers of audio tools and devices

Device Drivers

At the bottom layer is the hardware itself, the audio device. These devices are the audio cards made by a variety of manufacturers, all with different capabilities, interfaces, and prices. Just like any piece of hardware, in order for it to be visible and useful to the operating system, there must be a device driver. There are, of course, thousands of device drivers written for Linux. Writing Linux device drivers is a specialty in itself, and there are dedicated sources for this, such as Linux Device Drivers, Third Edition ( http://lwn.net/Kernel/LDD3/ ) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.

Device drivers must have standardized APIs “at the top” so that users of the device have a known interface to code to. The OSS device driver API was used for audio devices until it was made closed source, at which point developers switched to the ALSA API. While OSS v4 has become open again, the ALSA interface is supported in the kernel, while OSS is not.

Ideally, a device driver API should expose all of the features of hardware while not adding additional baggage. For audio, it is not always so easy to set boundaries for what an audio driver should do. For example, some sound cards will support the mixing of analog signals from different sources, while others will not, and some sound cards will have MIDI synthesizers, while others will not. If the API is to expose these capabilities for sound cards that support them, then it might have to supply them in software for those sound cards that do not.

There is a limited amount of documentation on writing ALSA device drivers. The “ALSA Driver Documentation” page at www.alsa-project.org/main/index.php/ALSA_Driver_Documentation points to some documents, including the 2005 document on writing ALSA device drivers ( www.alsa-project.org/∼tiwai/writing-an-alsa-driver/ ) by Takashi Iwai. There is also a 2010 blog by Ben Collins at http://ben-collins.blogspot.com.au/2010/05/writing-alsa-driver-basics.html , “Writing an ALSA driver.” Otherwise, there seems to be little help.

Sound Servers

Linux is a multitasking, multithreaded operating system. It is possible that concurrent processes might want to write sounds to the audio cards concurrently. For example, a mail reader might want to “ding” the user to report new mail, even if they are in the middle of a noisy computer game. This is distinct from sound card capabilities of being able to mix sounds from different ports, such as an HDMI input port and an analog input port. It requires the ability to mix (or otherwise manage) sounds from different processes. As an example of the subtlety of this, should the volume of each process be individually controllable, or should the destination port (headphones or speaker) be individually controllable?

Such capabilities are beyond the scope of a device driver. Linux resolves this by having “sound servers,” which run above the device drivers and manage these more complex tasks. Above these sound servers sit applications that talk to the sound server, which in turn will pass the resultant digital signal to the device driver.

Here is where a significant difference occurs between sound servers. For professional audio systems, the sound server must be able to process and route audio with a minimal amount of latency or other negative effects. For consumer audio, control over volumes and destinations may be more important than latency; you probably won’t care if a new message “ding” takes an extra half-second. Between these may be other cases such as games requiring synchronization of audio and visual effects and karaoke players requiring synchronization of analog and digital sources.

The two major sound servers under Linux are Jack for professional audio and PulseAudio for consumer systems. They are designed for different use cases and consequently offer different features.

Lennart Poettering in “A Guide Through the Linux Sound API Jungle” ( http://0pointer.de/blog/projects/guide-to-sound-apis.html ) offers a good summary of these different use cases:

“I want to write a media-player-like application!”
Use GStreamer (unless your focus is only KDE, in which cases Phonon might be an alternative).
“I want to add event sounds to my application!”
Use libcanberra, and install your sound files according to the XDG sound theming/naming specifications (unless your focus is only KDE, in which case KNotify might be an alternative, although it has a different focus).
“I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!”
Use Jack and/or the full ALSA interface.
“I want to do basic PCM audio playback/capturing!”
Use the safe ALSA subset.
“I want to add sound to my game!”
Use the audio API of SDL for full-screen games, and use libcanberra for simple games with standard UIs such as Gtk+.
“I want to write a mixer application!”
Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.
“I want to write audio software for the plumbing layer!”
Use the full ALSA stack.
“I want to write audio software for embedded applications!”
For technical appliances, usually the safe ALSA subset is a good choice. This, however, depends highly on your use case .

Complexities

Figure 4-1 hides the real complexities of Linux sound. Mike Melanson (an Adobe engineer) in 2007 produced the diagram shown in Figure 4-2.

Figure 4-2. Linux audio relationships

The figure is not up-to-date. For example, OSS is no longer a major part of Linux. Some special-case complexities are, for example, that PulseAudio sits above ALSA, and it also sits below ALSA, as in Figure 4-3 (based on the one at http://insanecoding.blogspot.com.au/2009/06/state-of-sound-in-linux-not-so-sorry.html ).

Figure 4-3. ALSA and PulseAudio . This diagram is upside down compared to mine

The explanation is as follows:

PulseAudio is able to do things such as mixing application sounds that ALSA cannot do.
PulseAudio installs itself as the default ALSA output device.
An application sends audio to the ALSA default device, which sends it to PulseAudio.
PulseAudio mixes it with any other audio and then sends it back to a particular device in ALSA.
ALSA then plays the mixed sound.

Complex, yes, but it accomplishes tasks that would be difficult otherwise.

Conclusion

The architecture of the Linux sound system is complex, and new wrinkles are being added on a regular basis. However, this is the same for any audio system. Successive chapters will flesh out the details of many of these components.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Overview of Linux Sound Architecture

Create new playlist

Sign In

Sign Up