This article continues the topic of work deferral I started in "Kernel APIs, Part 2: Deferrable functions, kernel tasklets, and work queues" (developerWorks, March 2010). This time, I discuss the timer application programming interface (API) and a core element to all work deferral schemes, the kernel list construct. I also explore the kernel list API, which timers and other work deferral mechanisms (such as work queues) use.
Timers are an integral part of any operating system, and you'll find multiple timer mechanisms. Let's begin with a quick overview of the Linux timer schemes before delving further into their operation.
In the Linux kernel, time is measured by a global variable named
jiffies
, which identifies the number of ticks
that have occurred since the system was booted. The manner in which ticks
are counted depends, at its lowest level, on the particular hardware
platform on which you're running; however, it is typically incremented
through an interrupt. The tick rate (jiffies
's
least significant bit) is configurable, but in a recent 2.6 kernel for
x86, a tick equals 4ms (250Hz). The jiffies
global variable is used broadly in the kernel for a number of purposes,
one of which is the current absolute time to calculate the time-out value
for a timer (you'll see examples of this later).
There are few different schemes for timers in recent 2.6 kernels. The
simplest and least accurate of all timers (though suitable for most
instances) is the timer API. This API permits the construction of timers
that operate in the jiffies
domain (minimum 4ms
time-out). There's also the high-resolution timer API, which
permits timer constructions in which time is defined in nanoseconds.
Depending upon your processor and the speed at which it operates, your
mileage may vary, but the API does offer a way to schedule time-outs below
the jiffies
tick interval.
The standard timer API has been part of the Linux kernel for quite some time (since the early versions of the Linux kernel). Although it offers less accuracy than high-resolution timers, it is ideal for traditional driver time-outs that provide coverage of error cases when dealing with physical devices. In many cases, these time-outs never actually fire but instead are started and subsequently removed.
Simple kernel timers are implemented using the timer wheel. The idea
was originally introduced by Finn Arne Gangstad in 1997. It ignores the
problem of managing a large number of timers but does a good job of
managing a reasonable number of timers—the typical case. (The
original timer implementation simply kept timers doubly linked in
expiration order. Although conceptually simple, the approach was not
scalable.) The timer wheel is a collection of buckets in which each bucket
represents a chunk of time in the future for timer expiration. The buckets
are defined using logarithmic time based on five buckets. Using
jiffies
as the granularity of time, a number of
groups is defined that represents future periods of expiration (where each
group is represented by a list of timers). Timer insertion occurs using
list operations that are O(1) complexity, with expiration occurring in
O(N) time. Expiration of timers occurs in a cascading
operation, where timers are removed from higher-granularity buckets, and
then inserted into lower-granularity buckets as their expiration time
decreases. Let's now look at the API for this timer implementation.
Linux provides a simple API for the construction and management of timers. It consists of functions (and helper functions) for timer creation, cancellation, and management.
Timers are defined by the timer_list
structure,
which includes all of the data necessary to implement a timer (including
list pointers and optional timer statistics that are configured at compile
time). From a user's perspective, timer_list
contains an expiration time, a callback function (when/if the timer
expires), and a user-provided context. The user must then initialize the
timer, which he or she can do in a few ways. The simplest method is a call
to setup_timer
, which initializes the timer and
sets the user-provided callback function and context. Otherwise, the user
can set these values (function and data) in the timer and simply call
init_timer
. Note that
init_timer
is called internally by
setup_timer
"
void init_timer( struct timer_list *timer ); void setup_timer( struct timer_list *timer, void (*function)(unsigned long), unsigned long data ); |
With an initialized timer, the user now needs to set the expiration time,
which is done through a call to mod_timer
. As
users commonly provide an expiration in the future, they typically add
jiffies
here to offset from the current time.
Users can also delete a timer (if it has not expired) through a call to
del_timer
:
int mod_timer( struct timer_list *timer, unsigned long expires ); void del_timer( struct timer_list *timer ); |
Finally, users can determine whether the timer is pending (yet to fire)
through a call to timer_pending
(1
is returned if the timer is pending):
int timer_pending( const struct timer_list *timer ); |
Let's look at some of these API functions in practice. Listing 1 provides a simple kernel module that demonstrates the
core aspects of the simple timer API. Within
init_module
, you initialize a timer with
setup_timer
and then kick it off with a call to
mod_timer
. When the timer expires, the callback
function (my_timer_callback
) is invoked.
Finally, the timer deletion (via del_timer
)
occurs when you remove the module. (Note the return check from
del_timer
, which identifies whether the timer
is still in use.)
Listing 1. Exploring the simple timer API
#include <linux/kernel.h> #include <linux/module.h> #include <linux/timer.h> MODULE_LICENSE("GPL"); static struct timer_list my_timer; void my_timer_callback( unsigned long data ) { printk( "my_timer_callback called (%ld).\n", jiffies ); } int init_module( void ) { int ret; printk("Timer module installing\n"); // my_timer.function, my_timer.data setup_timer( &my_timer, my_timer_callback, 0 ); printk( "Starting timer to fire in 200ms (%ld)\n", jiffies ); ret = mod_timer( &my_timer, jiffies + msecs_to_jiffies(200) ); if (ret) printk("Error in mod_timer\n"); return 0; } void cleanup_module( void ) { int ret; ret = del_timer( &my_timer ); if (ret) printk("The timer is still in use...\n"); printk("Timer module uninstalling\n"); return; } |
You can learn more about the timer API in ./include/linux/timer.h. Although the simple timer API is easy and efficient, it does not offer the accuracy required for real-time applications. For that, let's look at a recent addition to Linux that supports higher-resolution timers.
High-resolution timers (or hrtimers) provide a high-precision
framework for timer management independent of the previously discussed
timer framework because of complexities in merging the two frameworks.
Although timers operate on the granularity of
jiffies
, hrtimers operate at the granularity of
nanoseconds.
The hrtimer framework is implemented differently from the traditional timer API. Instead of buckets and timer cascading, hrtimers maintain a time-ordered data structure of timers (timers are inserted in time order to minimize processing at activation time). The data structure used is a red-black tree, which is ideal for performance-focused applications (and happens to be available generically as a library within the kernel).
The hrtimer framework is available as an API within the kernel and is also
used by user space applications through
nanosleep
, itimers
,
and the Portable Operating System Interface (POSIX)-timers interface. It
was mainlined into the 2.6.21 kernel.
The hrtimer API has some similarities to the traditional API as well as
some fundamental differences to account for the additional timing control.
The first thing you'll notice is that time is represented not in
jiffies
but in a special data type called
ktime
. This representation hides some of the
details of efficiently managing time at this granularity. The API
formalizes the distinction between absolute and relative times, requiring
the caller to specify the type.
Like the traditional timer API, timers are represented by a
structure—in this case, hrtimer
. This
structure defines the timer from a user perspective (callback function,
expiration time, and so on) and also incorporates the management
information (where the timer exists in the red-black tree, optional
statistics, and so on).
The process begins with the initialization of a timer through
hrtimer_init
. This call includes the timer,
clock definition, and timer mode (one-shot or restart). The clock to use
is defined in ./include/linux/time.h and represents the various clocks
that the system supports (such as the real-time clock or a monotonic clock
that simply represents time from a starting point, such as system boot).
Once a timer has been initialized, it can be started with
hrtimer_start
. This call includes the
expiration time (in ktime_t
) and the mode of
the time value (absolute or relative value).
void hrtimer_init( struct hrtimer *time, clockid_t which_clock, enum hrtimer_mode mode ); int hrtimer_start(struct hrtimer *timer, ktime_t time, const enum hrtimer_mode mode); |
Once an hrtimer has started, it can be cancelled through a call to
hrtimer_cancel
or
hrtimer_try_to_cancel
. Each function includes
the hrtimer reference as the timer to be stopped. These functions differ
in that the hrtimer_cancel
function attempts to
cancel the timer, but if it has already fired, it will wait for the
callback function to finish. The
hrtimer_try_to_cancel
function differs in that
it also attempts to cancel the timer but will return failure if the timer
has fired.
int hrtimer_cancel(struct hrtimer *timer); int hrtimer_try_to_cancel(struct hrtimer *timer); |
You can check to see if the hrtimer has activated its callback through a
call to hrtimer_callback_running
. Note that
this function is called internally by
hrtimer_try_to_cancel
in order to return an
error if the timer's callback function was called.
int hrtimer_callback_running(struct hrtimer *timer); |
Use of the hrtimer API is quite simple, as shown in Listing 2. Within init_module
, you
start by defining your relative time to time-out (in this case, 200ms).
You initialize your hrtimer with a call to
hrtimer_init
(using the monotonic clock), and
then set the callback function. Finally, you start the timer using your
previously created ktime
value. When the timer
fires, the my_hrtimer_callback
function is
called, which returns HRTIMER_NORESTART
so that
the timer is not automatically restarted. In the
cleanup_module
function, you clean up by
cancelling the timer with hrtimer_cancel
.
Listing 2. Exploring the hrtimer API
#include <linux/kernel.h> #include <linux/module.h> #include <linux/hrtimer.h> #include <linux/ktime.h> MODULE_LICENSE("GPL"); #define MS_TO_NS(x) (x * 1E6L) static struct hrtimer hr_timer; enum hrtimer_restart my_hrtimer_callback( struct hrtimer *timer ) { printk( "my_hrtimer_callback called (%ld).\n", jiffies ); return HRTIMER_NORESTART; } int init_module( void ) { ktime_t ktime; unsigned long delay_in_ms = 200L; printk("HR Timer module installing\n"); ktime = ktime_set( 0, MS_TO_NS(delay_in_ms) ); hrtimer_init( &hr_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL ); hr_timer.function = &my_hrtimer_callback; printk( "Starting timer to fire in %ldms (%ld)\n", delay_in_ms, jiffies ); hrtimer_start( &hr_timer, ktime, HRTIMER_MODE_REL ); return 0; } void cleanup_module( void ) { int ret; ret = hrtimer_cancel( &hr_timer ); if (ret) printk("The timer was still in use...\n"); printk("HR Timer module uninstalling\n"); return; } |
There's much more to the hrtimer API than has been touched on here. One interesting aspect is the ability to define the execution context of the callback function (such as in softirq or hardiirq context). You can learn more about the hrtimer API from the include file in ./include/linux/hrtimer.h.
As mentioned early in this article, lists are useful structures, and the kernel provides an efficient implementation for generic use. Additionally, you'll find lists underneath the APIs that we've explored thus far. Understanding the doubly linked list API will help you develop with this efficient data structure as well as understand the plethora of code in the kernel that utilizes lists. Let's now take a quick tour of the kernel list API.
The API provides a list_head
structure that is
used to represent not only the list head (anchor) but also the
in-structure list pointers. Let's look at a sample structure that includes
list capabilities (see Listing 3). Note the addition
of the list_head
structure, which is used for
object linkage. And note that you can add the
list_head
structure anywhere in your structure,
and—though some GCC magic (list_entry
and container_of
, defined in
./include/kernel/kernel.h)—you can dereference from the list
pointer to the super-object.
Listing 3. Sample structure with list references
struct my_data_structure { int value; struct list_head list; }; |
Like any list implementation, you need a list head that serves as the
anchor for the list. This is commonly done with the
LIST_HEAD
macro, which provides the declaration
and initialization of the list. This macro creates a struct
list_head
object onto which you can add
objects.
LIST_HEAD( new_list ) |
You can also create a list head manually (for example, if your list head is
in another structure) through the use of the
LIST_HEAD_INIT
macro.
With the primary initialization complete, you can manipulate the list using
the list_add
and
list_del
functions (in addition to many more).
Now, let's jump into example code, which better serves to illustrate the
use of the API.
Listing 4 provides a simple kernel module that
explores a number of the list API functions (though many more can be found
in ./include/linux/list.h). This example creates two lists, populates them
in the init_module
function, and then
manipulates the lists in the cleanup_module
function.
At the start, you create your data structure
(my_data_struct
), which includes some data and
then two list heads. This example demonstrates that you can insert an
object into multiple lists at the same time. You then create two list
heads (my_full_list
and
my_odd_list
).
In the init_module
function, you create 10 data
objects and load them onto the lists (all objects on the
my_full_list
, and all odd-valued objects onto
the my_odd_list
) using the
list_add
function. Note here that
list_add
takes two arguments, the first being
the list reference within the object to be used and the second being the
list anchor. This demonstrates the ability of a data object to be on
multiple lists, using the kernel's internal tricks for identifying the
super-object that contains the list reference.
The cleanup_module
function illustrates a few
more capabilities of the list API, the first of which is the
list_for_each
macro, which simplifies list
iteration. For this macro, you provide a reference to the current object
(pos
) and the list reference to be iterated.
For each iteration, you receive a list_head
reference, which can be provided to list_entry
to identify the container object (your data structure). Specify your
structure and the list variable within your structure, which is used
internally to dereference back to the container.
To emit the odd list, you use another iteration macro called
list_for_each_entry
. This macro is simpler in
that it automatically provides your data structure, obviating the need to
perform a list_entry
.
Finally, you use list_for_each_safe
to iterate
the list for purposes of freeing the allocated elements. This macro
permits iteration over the list with safeguards against removal of a list
entry (which you'll do as part of the iteration). You use
list_entry
to get at your data object (to free
it back to the kernel pool), and then use
list_del
to free the entry on the list.
Listing 4. Exploring the list API
#include <linux/kernel.h> #include <linux/module.h> #include <linux/list.h> MODULE_LICENSE("GPL"); struct my_data_struct { int value; struct list_head full_list; struct list_head odd_list; }; LIST_HEAD( my_full_list ); LIST_HEAD( my_odd_list ); int init_module( void ) { int count; struct my_data_struct *obj; for (count = 1 ; count < 11 ; count++) { obj = (struct my_data_struct *) kmalloc( sizeof(struct my_data_struct), GFP_KERNEL ); obj->value = count; list_add( &obj->full_list, &my_full_list ); if (obj->value & 0x1) { list_add( &obj->odd_list, &my_odd_list ); } } return 0; } void cleanup_module( void ) { struct list_head *pos, *q; struct my_data_struct *my_obj; printk("Emit full list\n"); list_for_each( pos, &my_full_list ) { my_obj = list_entry( pos, struct my_data_struct, full_list ); printk( "%d\n", my_obj->value ); } printk("Emit odd list\n"); list_for_each_entry( my_obj, &my_odd_list, odd_list ) { printk( "%d\n", my_obj->value ); } printk("Cleaning up\n"); list_for_each_safe( pos, q, &my_full_list ) { struct my_data_struct *tmp; tmp = list_entry( pos, struct my_data_struct, full_list ); list_del( pos ); kfree( tmp ); } return; } |
Numerous other functions exist for adding to the tail of a list instead of
the head (list_add_tail
), joining lists
(list_splice
), and testing the contents of a
list (list_empty
). See Resources for more details on kernel list functions.
This article explored a few APIs that demonstrate the ability to segregate functionality where necessary (timer and high-precision hrtimer APIs) as well as commonize code for code reuse (list API). Traditional timers provide an efficient mechanism for typical driver time-outs, where hrtimers provide another level of quality of service for more precise timer capabilities. The list API provides a very generic but efficient and richly functional interface. If you write kernel code, you'll run across one or all three of these APIs, so they're definitely worth exploring.
Learn
- Learn about other work deferral schemes
(tasklets and work queues) in "Kernel APIs, Part 2: Deferrable functions, kernel tasklets, and work queues"
(developerWorks, March 2010).
- For more information on the traditional
timer implementation, including details on the cascading timer wheel,
check out this Linux kernel mailing list post from Ingo Molnar.
It provides a detailed view of the timer API, its rationale, and
considerable details of its implementation.
- Thomas Gleixner provides a great
introduction to the timer subsystem and the split of timers and
high-resolution timers in this Linux kernel mailing list post of the
ktimers
patch. It also provides details on the use of timers by higher-level subsystems. - For a detailed look at the changes to the
Linux time system (which provides the basis for timers in the kernel),
check out the paper Hrtimers and Beyond: Transforming the Linux Time Subsystems (PDF). This
paper details the design of the overall time subsystem and the changes
required to support multiple timer APIs.
- The Linux list API uses GCC extensions to
identify the container of an object that includes a list pointer structure
(using
typeof
andoffsetof
). To learn about other GCC extensions, check out "GCC hacks in the Linux kernel" (developerworks, November 2008). - The FAQ/LinkedLists at
KernelNewbies provides a good introduction to lists and some of the
internals.
-
In the developerWorks Linux zone,
find hundreds of how-to
articles
and tutorials, as well as downloads, discussion forums,
and a wealth other resources for Linux developers and administrators.
-
Stay current with
developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
-
Attend a free developerWorks Live!
briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
-
Watch developerWorks on-demand demos
ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
-
Follow developerWorks on Twitter.
Get products and technologies
-
Evaluate IBM products
in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the
SOA Sandbox
learning how to implement Service Oriented Architecture efficiently.
Discuss
-
Get involved in the My developerWorks community.
Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
M. Tim Jones is an embedded firmware architect and the author of Artificial Intelligence: A Systems Approach, GNU/Linux Application Programming (now in its second edition), AI Application Programming (in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado.