Several system calls have been introduced to allow processes to change their priorities and scheduling policies. As a general rule, users are always allowed to lower the priorities of their processes. However, if they want to modify the priorities of processes belonging to some other user or if they want to increase the priorities of their own processes, they must have superuser privileges.
The nice( )
[79] system call allows
processes to change their base priority. The integer value contained
in the increment
parameter is used to modify the
nice
field of the process descriptor. The
nice Unix command, which allows users to run
programs with modified scheduling priority, is based on this system
call.
The sys_nice( )
service routine handles the
nice( )
system call. Although the
increment
parameter may have any value, absolute
values larger than 40 are trimmed down to 40. Traditionally, negative
values correspond to requests for priority increments and require
superuser privileges, while positive ones correspond to requests for
priority decrements. In the case of a negative increment, the
function invokes the capable( )
function to verify
whether the process has a CAP_SYS_NICE
capability.
We discuss that function, together with the notion of capability, in
Chapter 20. If the user turns out to have the
capability required to change priorities, sys_nice( )
adds the value of increment
to the
nice
field of current
. If
necessary, the value of this field is trimmed down so it
won’t be less than - 20 or greater than + 19.
The nice( )
system call is maintained for backward
compatibility only; it has been replaced by the setpriority( )
system call described next.
The nice( )
system call affects only the process
that invokes it. Two other system calls, denoted as
getpriority( )
and setpriority( )
, act on the base priorities of all processes in a given
group. getpriority( )
returns 20 minus the lowest
nice
field value among all processes in a given
group—that is, the highest priority among that processes;
setpriority( )
sets the base priority of all
processes in a given group to a given value.
The kernel implements these system calls by means of the
sys_getpriority( )
and sys_setpriority( )
service routines. Both of them act essentially on the
same group of parameters:
which
The value that identifies the group of processes; it can assume one of the following:
PRIO_PROCESS
Selects the processes according to their process ID
(pid
field of the process descriptor).
PRIO_PGRP
Selects the processes according to their group ID
(pgrp
field of the process descriptor).
PRIO_USER
Selects the processes according to their user ID
(uid
field of the process descriptor).
who
The value of the pid
, pgrp
, or
uid
field (depending on the value of
which
) to be used for selecting the processes. If
who
is 0, its value is set to that of the
corresponding field of the current
process.
niceval
The new base priority value (needed only by sys_setpriority( )
). It should range between - 20 (highest priority) and +
19 (lowest priority).
As stated before, only processes with a
CAP_SYS_NICE
capability are allowed to increase
their own base priority or to modify that of other processes.
As we saw in Chapter 9, system calls return a
negative value only if some error occurred. For this reason,
getpriority( )
does not return a normal nice value
ranging between - 20 and + 19, but rather a nonnegative value ranging
between 1 and 40.
We now
introduce a group of system calls that allow processes to change
their scheduling discipline and, in particular, to become real-time
processes. As usual, a process must have a
CAP_SYS_NICE
capability to modify the values of
the rt_priority
and policy
process descriptor fields of any process, including itself.
The sched_getscheduler( )
system call queries the
scheduling policy currently applied to the process identified by the
pid
parameter. If pid
equals 0,
the policy of the calling process is retrieved. On success, the
system call returns the policy for the process:
SCHED_FIFO
, SCHED_RR
, or
SCHED_OTHER
. The corresponding
sys_sched_getscheduler( )
service routine invokes
find_process_by_pid( )
, which locates the process
descriptor corresponding to the given pid
and
returns the value of its policy
field.
The sched_setscheduler( )
system call sets both
the scheduling policy and the associated parameters for the process
identified by the parameter pid
. If
pid
is equal to 0, the scheduler parameters of the
calling process will be set.
The corresponding sys_sched_setscheduler( )
function checks whether the scheduling policy specified by the
policy
parameter and the new static priority
specified by the param->sched_priority
parameter are valid. It also checks whether the process has
CAP_SYS_NICE
capability or whether its owner has
superuser rights. If everything is OK, it executes the following
statements:
p->policy = policy; p->rt_priority = param->sched_priority; if (task_on_runqueue(p)) move_first_runqueue(p); current->need_resched = 1;
The sched_getparam( )
system call retrieves the
scheduling parameters for the process identified by
pid
. If pid
is 0, the
parameters of the current
process are retrieved.
The corresponding sys_sched_getparam( )
service
routine, as one would expect, finds the process descriptor pointer
associated with pid
, stores its
rt_priority
field in a local variable of type
sched_param
, and invokes copy_to_user( )
to copy it into the process address space at the address
specified by the param
parameter.
The sched_setparam( )
system call is similar to
sched_setscheduler( )
. The difference is that
sched_setscheduler( )
does not let the caller set
the policy
field’s
value.[80] The
corresponding sys_sched_setparam( )
service
routine is almost identical to sys_sched_setscheduler( )
, but the policy of the affected process is never changed.
The sched_yield( )
system call allows a process
to relinquish the CPU voluntarily without being suspended; the
process remains in a TASK_RUNNING
state, but the
scheduler puts it at the end of the runqueue list. In this way, other
processes that have the same dynamic priority have a chance to run.
The call is used mainly by SCHED_FIFO
processes.
The corresponding sys_sched_ yield( )
service
routine checks first if there is some process in the system that is
runnable, other than the process executing the system call and the
swapper kernel threads. If there is no such
process, sched_yield( )
returns without performing
any action because no process would be able to use the freed
processor. Otherwise, the function executes the following statements:
if (current->policy == SCHED_OTHER) current->policy |= SCHED_YIELD; current->need_resched = 1; spin_lock_irq(&runqueue_lock); move_last_runqueue(current); spin_unlock_irq(&runqueue_lock);
As a result, schedule( )
is invoked when returning
from the sys_sched_ yield( )
service routine (see
Section 4.8), and the current process will most
likely be replaced.
The sched_get_priority_min( )
and
sched_get_priority_max( )
system calls return,
respectively, the minimum and the maximum real-time static priority
value that can be used with the scheduling policy identified by the
policy
parameter.
The sys_sched_get_priority_min( )
service routine
returns 1 if current
is a real-time process, 0
otherwise.
The sys_sched_get_priority_max( )
service routine
returns 99 (the highest priority) if current
is a
real-time process, 0 otherwise.
The sched_rr_get_interval( )
system writes in a
structure stored in the User Mode address space the Round Robin time
quantum for the real-time process identified by the
pid
parameter. If pid
is zero,
the system call writes the time quantum of the current process.
The corresponding sys_sched_rr_get_interval( )
service routine invokes, as usual, find_process_by_pid( )
to retrieve the process descriptor associated with
pid
. It then converts the number of ticks stored
in the nice
field of the selected process
descriptor into seconds and nanoseconds and copies the numbers into
the User Mode structure.