Linux is a system on which devices notify the kernel about events by means of interrupt requests (IRQs), though some devices are polled. The CPU exposes IRQ lines, shared or not, used by connected devices so that when a device needs the CPU, it sends a request to the CPU. When the CPU gets this request, it stops its actual job and saves its context, in order to serve the request issued by the device. After serving the device, its state is restored back to exactly where it stopped when the interruption occurred.
In this chapter, we will deal with the APIs that the kernel offers to manage IRQs and the ways in which multiplexing can be done. Moreover, we will analyze and look closer at interrupt controller driver writing.
To summarize, in this chapter, the following topics will be covered:
On many platforms, a special device is responsible for managing IRQ lines. That device is the interrupt controller and it stands between the CPU and the interrupt lines it manages. The following is a diagram that shows the interactions that take place:
Not only can devices raise interrupts, but some processor operations can do that too. There are then two different kinds of interrupts:
Before getting deeper into interrupt management in the Linux kernel, let's talk a bit more about exceptions.
Exceptions are consequences of programming errors, handled by the kernel, which sends a signal to the program and tries to recover from the error. These are classified into two categories, enumerated as follows:
Now that we have introduced the different families of interrupts, let's learn how they are implemented from within the interrupt controller.
Having a single interrupt from the CPU is usually not enough. Most systems have tens or hundreds of them. Now comes interrupt controller, which allows them to be multiplexed. Very often, architecture or platform-specific implementations offer specific facilities, such as the following:
IRQ management and interrupt controller drivers both rely on the concept of the IRQ domain, which is built on top of the following structures:
All the preceding data structures are part of the IRQ domain API. An interrupt controller is represented in the kernel by an instance of the struct irq_chip structure, which describes the actual hardware device, and some methods used by the IRQ core. The following code block shows its definition:
struct irq_chip {
struct device *parent_device;
const char *name;
void (*irq_enable)(struct irq_data *data);
void (*irq_disable)(struct irq_data *data);
void (*irq_ack)(struct irq_data *data);
void (*irq_mask)(struct irq_data *data);
void (*irq_unmask)(struct irq_data *data);
void (*irq_eoi)(struct irq_data *data);
int (*irq_set_affinity)(struct irq_data *data,
const struct cpumask *dest, bool force);
int (*irq_retrigger)(struct irq_data *data);
int (*irq_set_type)(struct irq_data *data,
unsigned int flow_type);
int (*irq_set_wake)(struct irq_data *data,
unsigned int on);
void (*irq_bus_lock)(struct irq_data *data);
void (*irq_bus_sync_unlock)(struct irq_data *data);
int (*irq_get_irqchip_state)(struct irq_data *data,
enum irqchip_irq_state which, bool *state);
int (*irq_set_irqchip_state)(struct irq_data *data,
enum irqchip_irq_state which, bool state);
void (*ipi_send_single)(struct irq_data *data,
unsigned int cpu);
void (*ipi_send_mask)(struct irq_data *data,
const struct cpumask *dest);
unsigned long flags;
};
The following list explains the meanings of the elements in the structure:
Each interrupt controller is given a domain, which is to the controller what an address space is to a process (see Chapter 10, Understanding the Linux Kernel Memory Allocation). The interrupt controller domain is described in the kernel with a struct irq_domain structure. It manages mappings between hardware IRQ numbers and Linux IRQ numbers (that is, virtual IRQs). It is the hardware interrupt number translation object. The following code block shows its definition:
struct irq_domain {
const char *name;
const struct irq_domain_ops *ops;
void *host_data;
unsigned int flags;
unsigned int mapcount;
/* Optional data */
struct fwnode_handle *fwnode;
[...]
};
For the sake of readability, only elements that are relevant to us have been listed. The following list tells us their meanings:
An interrupt controller driver creates and registers an IRQ domain by calling one of the irq_domain_add_<mapping_method>() functions, where <mapping_method> is the method by which hwirq should be mapped to Linux virq. These functions are described in the following list:
struct irq_domain *irq_domain_add_linear(
struct device_node *of_node,
unsigned int size,
const struct irq_domain_ops *ops,
void *host_data)
struct irq_domain *irq_domain_add_tree(
struct device_node *of_node,
const struct irq_domain_ops *ops,
void *host_data)
struct irq_domain *irq_domain_add_nomap(
struct device_node *of_node,
unsigned int max_irq,
const struct irq_domain_ops *ops,
void *host_data)
In these functions, of_node is a pointer to the interrupt controller's DT node. size corresponds to the number of interrupts in the domain. ops represent map/unmap domain callbacks, and host_data is the controller's private data pointer.
When it is initially created, the IRQ domain is empty (no mapping). A mapping is created and added as and when the IRQ chip driver calls irq_create_mapping(), which has the following prototype:
unsigned int irq_create_mapping(struct irq_domain
*domain, irq_hw_number_t hwirq)
In the preceding function, domain is the domain to which this hardware interrupt belongs, or NULL for the default domain; hwirq represents the hardware interrupt number in that domain space.
If a mapping for the hwirq number doesn't already exist in the IRQ domain, the function will allocate a new Linux IRQ descriptor (struct irq_desc) structure, returning a virtual interrupt number at the same time. Then, it will associate it with the hwirq number (by means of the irq_domain_associate() function, which in turn invokes the irq_domain_ops.map callback so that the driver can perform any required hardware setup). To understand this paragraph, we need to describe the IRQ domain operation data structure (struct irq_domain_ops), which is defined in the following code block:
struct irq_domain_ops {
int (*map)(struct irq_domain *d, unsigned int virq,
irq_hw_number_t hw);
void (*unmap)(struct irq_domain *d,
unsigned int virq);
int (*xlate)(struct irq_domain *d,
struct device_node *node,
const u32 *intspec,
unsigned int intsize,
unsigned long *out_hwirq,
unsigned int *out_type);
[...]
};
Elements in the data structure have been limited to the scope of this chapter. Nonetheless, the complete data structure can be found in include/linux/irqdomain.h in the kernel source. The following list tells us the meanings of the elements we have enumerated:
void irq_set_chip_and_handler(unsigned int irq,
struct irq_chip *chip,
irq_flow_handler_t handle)
In this function, irq is the Linux IRQ given as a parameter to the map() function, and chip is your IRQ chip. There are, however, dummy controllers that need almost nothing in their irq_chip structure. In this case, the driver passes dummy_irq_chip, defined in kernel/irq/dummychip.c, which is a kernel-predefined irq_chip structure defined for such controllers. handle determines the interrupt flow handler, the one that calls the real handler registered using request_irq(). Its value depends on the IRQ being edge- or level-triggered. In either case, handle should be set to handle_edge_irq or handle_level_irq. Both are kernel helper functions that do some operations before and after calling the real IRQ handler. An example is shown in this code block:
static int ativic32_irq_domain_map(
struct irq_domain *id,
unsigned int virq,
irq_hw_number_t hw)
{
[...]
if (int_trigger_type & (BIT(hw))) {
irq_set_chip_and_handler(virq,
&ativic32_chip,
handle_edge_irq);
type = IRQ_TYPE_EDGE_RISING;
} else {
irq_set_chip_and_handler(virq,
&ativic32_chip,
handle_level_irq);
type = IRQ_TYPE_LEVEL_HIGH;
}
irqd_set_trigger_type(irq_data, type);
return 0;
}
An example of domain operation is given in the following code block:
static struct irq_domain_ops mcp23016_irq_domain_ops = {
.map = mcp23016_irq_domain_map,
.xlate = irq_domain_xlate_twocell,
};
When an interrupt is received, the irq_find_mapping() function is used to find the Linux IRQ number from the hwirq number. Of course, the mapping must exist prior to being returned. A Linux IRQ number is always tied to a struct irq_desc structure, which is the structure by which Linux describes an IRQ and has the following definition:
struct irq_desc {
struct irq_data irq_data;
unsigned int __percpu *kstat_irqs;
irq_flow_handler_t handle_irq;
struct irqaction *action;
unsigned int irqs_unhandled;
raw_spinlock_t lock;
struct cpumask *percpu_enabled;
atomic_t threads_active;
wait_queue_head_t wait_for_threads;
#ifdef CONFIG_PM_SLEEP
unsigned int nr_actions;
unsigned int no_suspend_depth;
unsigned int force_resume_depth;
#endif
#ifdef CONFIG_PROC_FS
struct proc_dir_entry *dir;
#endif
Int parent_irq;
struct module *owner;
const char *name;
};
Some fields in this data structure are intentionally missing. For the remainder, the following list gives us their definitions:
When registering an interrupt handler, this handler is added to the end of the irq_desc.action list associated with that interrupt line. For instance, each call to request_irq() (or the threaded version, request_threaded_irq()) creates and adds one struct irqaction structure to the end of the irq_desc.action list (knowing that irq_desc is the descriptor for this interrupt). For a shared interrupt, this field will contain as many irqaction objects as there are handlers registered. An IRQ action data structure has the following definition:
struct irqaction {
irq_handler_t handler;
void *dev_id;
void __percpu *percpu_dev_id;
struct irqaction *next;
irq_handler_t thread_fn;
struct task_struct *thread;
unsigned int irq;
unsigned int flags;
unsigned long thread_flags;
unsigned long thread_mask;
const char *name;
struct proc_dir_entry *dir;
};
The meanings of each element in this data structure are as follows:
The following is the definition of important fields in the struct irq_data structure, which is per-IRQ chip data passed down to chip functions:
struct irq_data {
[...]
unsigned int irq;
unsigned long hwirq;
struct irq_chip *chip;
struct irq_domain *domain;
void *chip_data;
};
The following list gives the meanings of elements in this data structure:
Now that we are familiar with the data structures of the IRQ framework, we can go a bit further and study how interrupts are requested and propagated all along the processing chain.
In Chapter 3, Dealing with Kernel Core Helpers, we introduced peripheral IRQs, using request_irq() and request_threaded_irq(). With the former, you register a handler (top half) that will be executed in an atomic context, from which you can schedule a bottom half using one of the mechanisms discussed in that same chapter. On the other hand, with the _threaded variant, you can provide top and bottom halves to the function, so that the former will be run as the hard IRQ handler, which may decide to raise the second and threaded handler or not, which will be run in a kernel thread.
The problem with those approaches is that sometimes, drivers requesting an IRQ do not know about the nature of the interrupt controller that provides this IRQ line, especially when the interrupt controller is a discrete chip (typically a GPIO expander connected over SPI or I2C buses). Now comes the request_any_context_irq()function with which drivers requesting an IRQ know whether the handler will run in a thread context, and call request_threaded_irq() or request_irq() accordingly. This means that whether the IRQ associated with our device comes from an interrupt controller that may not sleep (memory-mapped one) or from one that can sleep (behind an I2C/SPI bus), there will be no need to change the code. Its prototype is shown in the following code block:
int request_any_context_irq(unsigned int irq,
irq_handler_t handler,
unsigned long flags,
const char * name,
void * dev_id);
Here are the meanings of each parameter in the function:
request_any_context_irq() means that you can either get a hard IRQ or a threaded one. It works in the same way as the usual request_irq(), except that it checks whether the IRQ is configured as nested or not, and calls the right backend. In other words, it selects either a hard IRQ or threaded handling method depending on the context. This function returns a negative value on failure. On success, it returns either IRQC_IS_HARDIRQ or IRQC_IS_NESTED. A use case is shown in the following code block:
static irqreturn_t packt_btn_interrupt(int irq,
void *dev_id)
{
struct btn_data *priv = dev_id;
input_report_key(priv->i_dev, BTN_0,
gpiod_get_value(priv->btn_gpiod) & 1);
input_sync(priv->i_dev);
return IRQ_HANDLED;
}
static int btn_probe(struct platform_device *pdev)
{
struct gpio_desc *gpiod;
int ret, irq;
[...]
gpiod = gpiod_get(&pdev->dev, "button", GPIOD_IN);
if (IS_ERR(gpiod))
return -ENODEV;
priv->irq = gpiod_to_irq(priv->btn_gpiod);
priv->btn_gpiod = gpiod;
[...]
ret = request_any_context_irq(
priv->irq,
packt_btn_interrupt,
(IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING),
"packt-input-button", priv);
if (ret < 0) {
dev_err(&pdev->dev,
"Unable to request GPIO interrupt line ");
goto err_btn;
}
return ret;
}
The preceding code is an excerpt of the driver sample of an input device driver. The advantage of using request_any_context_irq() is that you do not need to care about what can be done in the IRQ handler, since the context in which the handler will run depends on the interrupt controller that provides the IRQ line. In our example, if the GPIO belongs to a controller sitting on an I2C or SPI bus, the handler will be threaded. Otherwise (memory mapped), the handler will run in a hard IRQ context.
Let's consider the following diagram with a GPIO controller whose interrupt line is connected to a native GPIO on the SoC:
IRQs are always processed based on the Linux IRQ number (not hwirq). The general function to request an IRQ on a Linux system is request_threaded_irq(). request_irq() is a wrapper on request_threaded_irq() which just don't provide the bottom half. The following code block shows its prototype:
int request_threaded_irq(unsigned int irq,
irq_handler_t handler,
irq_handler_t thread_fn,
unsigned long irqflags,
const char *devname, void *dev_id)
When called, the function extracts struct irq_desc associated with the IRQ using the irq_to_desc() macro. It then allocates a new struct irqaction structure and sets it up, filling parameters such as handler and flags. The following code block is an excerpt:
action->handler = handler;
action->thread_fn = thread_fn;
action->flags = irqflags;
action->name = devname;
action->dev_id = dev_id;
That same function finally inserts/registers the descriptor in the proper IRQ list by invoking the __setup_irq() (by means of setup_irq()) function, defined in kernel/irq/manage.c.
Now, when an IRQ is raised, the kernel executes some assembler code in order to save the current state and jumps to the arch-specific handler, handle_arch_irq. For ARM architectures, this handler is set with the value of the handle_irq field in struct machine_desc of the platform in the setup_arch() function implemented in arch/arm/kernel/setup.c. The assignation is done as follows:
handle_arch_irq = mdesc->handle_irq
For SoCs that use the ARM Generic Interrupt Controller (GIC), the handle_irq callback is set with gic_handle_irq, in either drivers/irqchip/irq-gic.c or drivers/irqchip/irq-gic-v3.c:
set_handle_irq(gic_handle_irq);
gic_handle_irq() calls handle_domain_irq(), which executes generic_handle_irq(), in turn calling generic_handle_irq_desc(), which ends by calling desc->handle_irq(). The whole chain can be seen in arch/arm/kernel/irq.c. Now, handle_irq is the actual call for the flow handler, which we registered as mcp23016_irq_handler in the diagram.
gic_hande_irq() is a GIC interrupt handler. generic_handle_irq() will execute the handler of the SoC's GPIO4 IRQ, which will look for GPIO pins that issued the interrupt, and call generic_handle_irq_desc().
This section describes how the interrupt handlers of a parent call its children's interrupt handlers, in turn calling their children's interrupt handlers, and so on. The kernel offers two approaches on how to call interrupt handlers for child devices in the IRQ handler of the parent (interrupt controller) device. These are the chained and nested methods.
This approach is used for SoC's internal GPIO controllers, which are memory-mapped and which do not put the caller to sleep when these are accessed. Chained means that those interrupts are just chains of function calls (for example, SoC's GPIO module interrupt handler is being called from the GIC interrupt handler, just as a function call). generic_handle_irq() is used for interrupts chaining. Child IRQ handlers are called from inside the parent's hard IRQ handler. This means that even from within the child interrupt handlers, we are still in an atomic context (HW interrupt), and the driver must not call functions that may sleep.
With this flow, function calls are nested, which means interrupt handlers are not invoked in the parent's handler. handle_nested_irq() is used for creating nested interrupt child IRQs. Handlers are called inside a new thread created for this purpose. This method is used by controllers that sit on slow buses such as SPI or I2C (such as GPIO expanders), and whose access may sleep (I2C and SPI access routines may sleep). Nested interrupt handlers that run in a process context can call any sleeping function.
The most common ARM interrupt controller, GIC in the ARM multi-core processor, supports three types of interrupts:
In systems with an interrupt controller that supports private interrupts per core, some of the IRQ controller registers will be banked so that they're only accessible from one core (for example, a core will only be able to read/write its own interrupt configuration). Usually, to be able to do so, some interrupt controller registers are banked per CPU; a CPU can enable its local interrupt by writing to its banked registers.
The distributor block and the CPU interface block are logically partitioned in the GIC. Interacting with interrupt sources, the distributor block prioritizes interrupts and delivers them to the CPU interface block. The CPU interface block links to the system's processors and manages priority masking and preemption for the processors to which it is linked.
The GIC can support up to 8 CPU interfaces, each of which can handle up to 1,020 interrupts. Interrupt ID numbers 0–1019 are assigned by the GIC as follows:
Now that we are familiar with ARM GIC interrupt families, we can focus on the family we are interested in, that is, SGIs.
In ARM processors, there are 16 SGIs, numbered from 0 to 15, but the Linux kernel registers only a few of them: eight (from 0 to 7) to be precise. SGI8 to SGI15 are free for now. Registered SGIs are those defined in enum ipi_msg_type, which is defined as the following:
enum ipi_msg_type {
IPI_WAKEUP,
IPI_TIMER,
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
IPI_IRQ_WORK,
IPI_COMPLETION,
NR_IPI,
[...]
MAX_IPI
};
Their respective descriptions can be found in an array of strings, or ipi_types, defined in the following code block:
static const char *ipi_types[NR_IPI] = {
[IPI_WAKEUP] = "CPU wakeup interrupts",
[IPI_TIMER] = "Timer broadcast interrupts",
[IPI_RESCHEDULE] = "Rescheduling interrupts",
[IPI_CALL_FUNC] = "Function call interrupts",
[IPI_CPU_STOP] = "CPU stop interrupts",
[IPI_IRQ_WORK] = "IRQ work interrupts",
[IPI_COMPLETION] = "completion interrupts",
};
IPIs are registered in the set_smp_ipi_range() function, defined in the following code block:
void __init set_smp_ipi_range(int ipi_base, int n)
{
int i;
WARN_ON(n < MAX_IPI);
nr_ipi = min(n, MAX_IPI);
for (i = 0; i < nr_ipi; i++) {
int err;
err = request_percpu_irq(ipi_base + i,
ipi_handler, "IPI", &irq_stat);
WARN_ON(err);
ipi_desc[i] = irq_to_desc(ipi_base + i);
irq_set_status_flags(ipi_base + i, IRQ_HIDDEN);
}
ipi_irq_base = ipi_base;
/* Setup the boot CPU immediately */
ipi_setup(smp_processor_id());
}
In the preceding code block, each IPI is registered with request_percpu_irq() on a per-CPU basis. We can see that IPIs have the same handler, ipi_handler(), defined as follows:
static irqreturn_t ipi_handler(int irq, void *data)
{
do_handle_IPI(irq - ipi_irq_base);
return IRQ_HANDLED;
}
The underlying function executed in the handler is do_handle_IPI(), defined as follows:
static void do_handle_IPI(int ipinr)
{
unsigned int cpu = smp_processor_id();
if ((unsigned)ipinr < NR_IPI)
trace_ipi_entry_rcuidle(ipi_types[ipinr]);
switch (ipinr) {
case IPI_WAKEUP:
break;
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
case IPI_TIMER:
tick_receive_broadcast();
break;
#endif
case IPI_RESCHEDULE:
scheduler_ipi();
break;
case IPI_CPU_STOP:
ipi_cpu_stop(cpu);
break;
[...]
default:
pr_crit("CPU%u: Unknown IPI message 0x%x ",
cpu, ipinr);
break;
}
if ((unsigned)ipinr < NR_IPI)
trace_ipi_exit_rcuidle(ipi_types[ipinr]);
}
From the preceding function,
On a running system, you can look for available IPIs from the /proc/interrupt file, as shown in the following code block:
root@udoo-labcsmart:~# cat /proc/interrupts | grep IPI
IPI0: 0 0 CPU wakeup interrupts
IPI1: 29 22 Timer broadcast interrupts
IPI2: 84306 322774 Rescheduling interrupts
IPI3: 970 1264 Function call interruptsIPI4: 0 0 CPU stop interrupts
IPI5: 2505436 4064821 IRQ work interrupts
IPI6: 0 0 completion interrupts
root@udoo-labcsmart:~#
In the command output shown here, the first column is the IPI identifier and the last one is the description of the IPI. The columns in between are their respective numbers of executions on each CPU.
Now, IRQ multiplexing has no more secrets from you. We have discussed the most important element of IRQ management in Linux systems: the IRQ domain API. You have the basics to understand existing interrupt controller drivers, as well as their binding from within the DT. IRQ propagation has been discussed in order to explore what happens between the request and the handler invocation.
In the next chapter, we deal with a completely different topic: the Linux device model.