A timer is a software facility that allows functions to be invoked at some future moment, after a given time interval has elapsed; a time-out denotes a moment at which the time interval associated with a timer has elapsed.
Timers are widely used both by the kernel and by processes. Most device drivers use timers to detect anomalous conditions — floppy disk drivers, for instance, use timers to switch off the device motor after the floppy has not been accessed for a while, and parallel printer drivers use them to detect erroneous printer conditions.
Timers are also used quite often by programmers to force the execution of specific functions at some future time (see the later section Section 6.7.3).
Implementing a timer is relatively easy. Each timer contains a field
that indicates how far in the future the timer should expire. This
field is initially calculated by adding the right number of ticks to
the current value of
jiffies
. The field does not
change. Every time the kernel checks a timer, it compares the
expiration field to the value of jiffies
at the
current moment, and the timer expires when jiffies
is greater or equal to the stored value. This comparison is made via
the time_after
, time_before
,
time_after_eq
, and
time_before_eq
macros, which take care of possible
overflows of jiffies
.
Linux considers two types of timers called dynamic timers and interval timers. The first type is used by the kernel, while interval timers may be created by processes in User Mode.[51]
One word of caution about Linux timers: since checking for timer functions is always done by bottom halves that may be executed a long time after they have been activated, the kernel cannot ensure that timer functions will start right at their expiration times. It can only ensure that they are executed either at the proper time or after with a delay of up to a few hundreds of milliseconds. For this reason, timers are not appropriate for real-time applications in which expiration times must be strictly enforced.
Dynamic timers may be dynamically created and destroyed. No limit is placed on the number of currently active dynamic timers.
A dynamic timer is stored in the following
timer_list
structure:
struct timer_list { struct list_head list; unsigned long expires; unsigned long data; void (*function)(unsigned long); };
The function
field contains the address of the
function to be executed when the timer expires. The
data
field specifies a parameter to be passed to
this timer function. Thanks to the data
field, it
is possible to define a single general-purpose function that handles
the time-outs of several device drivers; the data
field could store the device ID or other meaningful data that could
be used by the function to differentiate the device.
The expires
field specifies when the timer
expires; the time is expressed as the number of ticks that have
elapsed since the system started up. All timers that have an
expires
value smaller than or equal to the value
of jiffies
are considered to be expired or
decayed.
The list
field includes the links for a doubly
linked circular list. There are 512 doubly linked circular lists to
hold dynamic timers. Each timer is inserted into one of the lists
based on the value of the expires
field. The
algorithm that uses this list is described later in this chapter.
To create and activate a dynamic timer, the kernel must:
Create a new timer_list
object — for
example, t
. This can be done in several ways by:
Defining a static global variable in the code.
Defining a local variable inside a function; in this case, the object is stored on the Kernel Mode stack.
Including the object in a dynamically allocated descriptor.
Initialize the object by invoking the
init_timer(&t)
function. This simply sets the
links in the list
field to
NULL
.
Load the function
field with the address of the
function to be activated when the timer decays. If required, load the
data
field with a parameter value to be passed to
the function.
If the dynamic timer is not already inserted in a list, assign a
proper value to the expires
field. Otherwise,
update the expires
field by invoking the
mod_timer( )
function, which also takes care of
moving the object into the proper list (discussed shortly).
If the dynamic timer is not already inserted in a list, insert the
t
element in the proper list by invoking the
add_timer(&t)
function.
Once the timer has decayed, the kernel automatically removes the
t
element from its list. Sometimes, however, a
process should explicitly remove a timer from its list using the
del_timer( )
or del_timer_sync( )
functions. Indeed, a sleeping process may be woken up
before the time-out is over; in this case, the process may choose to
destroy the timer. Invoking del_timer( )
or
del_timer_sync( )
on a timer already removed from
a list does no harm, so removing the timer within the timer function
is considered a good practice.
Being asynchronously activated, dynamic timers are prone to race conditions. For instance, consider a dynamic timer whose function acts on a discardable resource (e.g., a kernel module or a file data structure). Releasing the resource without stopping the timer may lead to data corruption if the timer function got activated when the resource no longer exists. Thus, a rule of thumb is to stop the timer before releasing the resource:
... del_timer(&t); X_Release_Resources( ); ...
In multiprocessor systems, however, this code is not safe because the
timer function might already be running on another CPU when
del_timer( )
is invoked. As a result, resources
may be released while the timer function is still acting on them. To
avoid this kind of race condition, the kernel offers the
del_timer_sync( )
function. It removes the timer
from the list, and then it checks whether the timer function is
executed on another CPU; in such a case, del_timer_sync( )
waits until the timer function terminates.
Other types of race conditions exist, of course. For instance, the
right way to modify the expires
field of an
already activated timer consists of using mod_timer( )
, rather than deleting the timer and recreating it
thereafter. In the latter approach, two kernel control paths that
want to modify the expires
field of the same timer
may mix each other up badly. The implementation of the timer
functions is made SMP-safe by means of the
timerlist_lock
spin lock: every time the kernel
must access the lists of dynamic timers, it disables the interrupts
and acquires this spin lock.
Choosing the proper data structure to implement dynamic timers is not easy. Stringing together all timers in a single list would degrade system performances, since scanning a long list of timers at every tick is costly. On the other hand, maintaining a sorted list would not be much more efficient, since the insertion and deletion operations would also be costly.
The adopted solution is based on a clever data structure that
partitions the expires
values into blocks of ticks
and allows dynamic timers to percolate efficiently from lists with
larger expires
values to lists with smaller ones.
The main data structure is an array called tvecs
,
whose elements point to five groups of lists identified by the
tv1
, tv2
,
tv3
, tv4
, and
tv5
structures (see Figure 6-2).
The tv1
structure is of type struct
timer_vec_root
, which includes an
index
field and a vec
array of
256 list_head
elements — that is, lists of
dynamic timers. It contains all dynamic timers that will decay within
the next 255 ticks.
The index
field specifies the currently scanned
list; it is initialized to 0 and incremented by 1 (modulo 256) at
every tick. The list referenced by index
contains
all dynamic timers that expired during the current tick; the next
list contains all dynamic timers that will expire in the next tick;
the (index
+k)th list contains
all dynamic timers that will expire in exactly k
ticks. When index
returns to 0, it signifies that
all the timers in tv1
have been scanned. In this
case, the list pointed to by tv2.vec[tv2.index]
is
used to replenish tv1
.
The tv2
, tv3
, and
tv4
structures of type struct
timer_vec
contain all dynamic timers
that will decay within the next 214-1,
220-1, and
226-1 ticks, respectively.
The tv5
structure is identical to the previous
ones, except that the last entry of the vec
array
includes dynamic timers with extremely large
expires
fields. It never needs to be replenished
from another array.
The timer_vec
structure is very similar to
timer_vec_root
: it contains an
index
field and a vec
array of
64 pointers to dynamic timer lists. The index
field specifies the currently scanned list; it is incremented by 1
(modulo 64) every 256*64i-2 ticks, where
i, ranging between 2 and 5, is the
tv
i group number. As in the
case of tv1
, when index
returns
to 0, the list pointed to by
tv
j
.vec[tv
j
.index]
is used to replenish tv
i
(i ranges between 2 and 4,
j is equal to i+1).
Thus, the first element of tv2
holds a list of all
timers that expire in the 256 ticks following the
tv1
timers; the timers in this list are sufficient
to replenish the whole array tv1
. The second
element of tv2
holds all timers that expire in the
following 256 ticks, and so on. Similarly, a single entry of
tv3
is sufficient to replenish the whole array
tv2
.
Figure 6-2 shows how these data structures are connected.
The timer_bh( )
function associated with the
TIMER_BH
bottom half invokes the
run_timer_list( )
auxiliary function to check for
decayed dynamic timers. The function relies on a variable similar to
jiffies
that is called
timer_jiffies
. This new variable is needed because
a few timer interrupts might occur before the activated
TIMER_BH
bottom half has a chance to run; this
happens typically when several interrupts of different types are
issued in a short interval of time.
The value of timer_jiffies
represents the
expiration time of the dynamic timer list yet to be checked: if it
coincides with the value of jiffies
, no backlog of
bottom half functions has accumulated; if it is smaller than
jiffies
, then bottom half functions that refer to
previous ticks must be dealt with. The variable is set to 0 at system
startup and is incremented only by run_timer_list( )
. Its value can never be greater than
jiffies
.
The run_timer_list( )
function is essentially
equivalent the following C fragment:
struct list_head *head, *curr; struct timer_list *timer; void (*fn)(unsigned long); unsigned long data; spin_lock_irq(&timerlist_lock); while ((long)(jiffies - timer_jiffies) >= 0) { if (!tv1.index) { int n = 1; do { cascade_timers(tvecs[n]); } while (tvecs[n]->index == 1 && ++n < 5)); } head = &tv1.vec[tv1.index]; for (curr = head->next; curr != head; curr = head->next) { timer = list_entry(curr, struct timer_list, list); fn = timer->function; data= timer->data; detach_timer(timer); timer->list.next = timer->list.prev = NULL; running_timer = timer; spin_unlock_irq(&timerlist_lock); fn(data); spin_lock_irq(&timerlist_lock); running_timer = NULL; } ++timer_jiffies; tv1.index = (tv1.index + 1) & 0xff; } spin_unlock_irq(&timerlist_lock);
The outermost while
loop ends when
timer_jiffies
becomes greater than the value of
jiffies
. Since the values of
jiffies
and timer_jiffies
usually coincide, the outermost while
cycle is
often executed only once. In general, the outermost loop is executed
jiffies
-
timer_jiffies
+
1
consecutive times. Moreover, if a timer
interrupt occurs while run_timer_list( )
is being
executed, dynamic timers that decay at this tick occurrence are also
considered, since the jiffies
variable is
asynchronously incremented by the PIT’s interrupt
handler (see the earlier section Section 6.2.1.1).
During a single execution of the outermost while
cycle, the dynamic timer functions included in the
tv1.vec[tv1.index]
list are executed. Before
executing a dynamic timer function, the loop invokes the
detach_timer( )
function to remove the dynamic
timer from the list. Once the list is emptied, the value of
tv1.index
is incremented (modulo 256) and the
value of timer_jiffies
is incremented.
When tv1.index
becomes equal to 0, all the lists
of tv1
have been checked; in this case, it is
necessary to refill the tv1
structure. This is
accomplished by the cascade_timers( )
function,
which transfers the dynamic timers included in
tv2.vec[tv2.index]
into
tv1.vec
, since they will necessarily decay within
the next 256 ticks. If tv2.index
is equal to 0,
the tv2
array of lists must be refilled with the
elements of tv3.vec[tv3.index]
.
Notice that run_timer_list( )
disables interrupts
and acquires the timerlist_lock
spin lock just
before entering the outermost loop; interrupts are enabled and the
spin lock is released right before invoking each dynamic timer
function, until its termination. This ensures that the dynamic timer
data structures are not corrupted by interleaved kernel control
paths.
To sum up, this rather complex algorithm ensures excellent
performance. To see why, assume for the sake of simplicity that the
TIMER_BH
bottom half is executed right after the
corresponding timer interrupt occurs. Then, in 255 timer interrupt
occurrences out of 256 (in 99.6% of the cases), the
run_timer_list( )
function just runs the functions
of the decayed timers, if any. To replenish
tv1.vec
periodically, it is sufficient 63 times
out of 64 to partition the list pointed to by
tv2.vec[tv2.index]
into the 256 lists pointed to
by tv1.vec
. The tv2.vec
array,
in turn, must be replenished in 0.02 percent of the cases (that is,
once every 163 seconds). Similarly, tv3
is
replenished every 2 hours and 54 minutes, and tv4
is replenished every 7 days and 18 hours. tv5
doesn’t need to be replenished.
To show how the outcome of all the previous activities are actually used in the kernel, we’ll show an example of the creation and use of a process time-out.
Let’s assume that the kernel decides to suspend the current process for two seconds. It does this by executing the following code:
timeou = 2 * HZ; set_current_state(TASK_INTERRUPTIBLE); /* or TASK_UNINTERRUPTIBLE */ remaining = schedule_timeout(timeout);
The kernel implements process time-outs by using dynamic timers. They
appear in the schedule_timeout( )
function, which
essentially executes the following statements:
struct timer_list timer; expire = timeout + jiffies; init_timer(&timer); timer.expires = expire; timer.data = (unsigned long) current; timer.function = process_timeout; add_timer(&timer); schedule( ); /* process suspended until timer expires */ del_timer_sync(&timer); timeout = expire - jiffies; return (timeout < 0 ? 0 : timeout);
When schedule( )
is invoked, another process is
selected for execution; when the former process resumes its
execution, the function removes the dynamic timer. In the last
statement, the function either returns 0, if the time-out is expired,
or it returns the number of ticks left to the time-out expiration if
the process was awoken for some other reason.
When the time-out expires, the kernel executes the following function:
void process_timeout(unsigned long data) { struct task_struct * p = (struct task_struct *) data; wake_up_process(p); }
The run_timer_list( )
function invokes
process_timeout( )
, passing as its parameter the
process descriptor pointer stored in the data
field of the timer
object. As a result, the
suspended process is woken up.
[51] Earlier versions of Linux use a third type of kernel timers: the so-called static timers . Static timers are very rudimental because they cannot be dynamically allocated or destroyed, and at most there could be 32 of them. Static timers were replaced by dynamic timers, and new kernels (starting from Version 2.4) no longer support them.