Kernel APIs, Part 3: Timers and lists in the ...

青松卓然 2012-10-29

展开全文

Kernel APIs, Part 3: Timers and lists in the 2.6 kernel

Efficient processing with work-deferral APIs

M. Tim Jones, Independent author

Summary: The Linux? kernel includes a variety of APIs intended to help developers build simpler and more efficient driver and kernel applications. Two of the more common APIs that can be used for work deferral are the list management and timer APIs. Discover these APIs, and learn how to develop kernel applications with timers and lists.

View more content in this series

Date: 30 Mar 2010
Level: Intermediate
PDF: A4 and Letter (50KB | 12 pages)Get Adobe? Reader?
Also available in: Korean Portuguese Spanish

Activity: 75658 views
Comments: 2 (View | Add comment - Sign in)

Average rating 4 stars based on 34 votes

Average rating (34 votes)
Rate this article

Connect with Tim

Tim is one of our most popular and prolific authors. Browse all of Tim's articles on developerWorks. Check out Tim's profile and connect with him, other authors, and fellow readers in My developerWorks.

This article continues the topic of work deferral I started in "Kernel APIs, Part 2: Deferrable functions, kernel tasklets, and work queues" (developerWorks, March 2010). This time, I discuss the timer application programming interface (API) and a core element to all work deferral schemes, the kernel list construct. I also explore the kernel list API, which timers and other work deferral mechanisms (such as work queues) use.

Timers are an integral part of any operating system, and you'll find multiple timer mechanisms. Let's begin with a quick overview of the Linux timer schemes before delving further into their operation.

Origin of (Linux) time

In the Linux kernel, time is measured by a global variable named jiffies, which identifies the number of ticks that have occurred since the system was booted. The manner in which ticks are counted depends, at its lowest level, on the particular hardware platform on which you're running; however, it is typically incremented through an interrupt. The tick rate (jiffies's least significant bit) is configurable, but in a recent 2.6 kernel for x86, a tick equals 4ms (250Hz). The jiffies global variable is used broadly in the kernel for a number of purposes, one of which is the current absolute time to calculate the time-out value for a timer (you'll see examples of this later).

Kernel timers

There are few different schemes for timers in recent 2.6 kernels. The simplest and least accurate of all timers (though suitable for most instances) is the timer API. This API permits the construction of timers that operate in the jiffies domain (minimum 4ms time-out). There's also the high-resolution timer API, which permits timer constructions in which time is defined in nanoseconds. Depending upon your processor and the speed at which it operates, your mileage may vary, but the API does offer a way to schedule time-outs below the jiffies tick interval.

Standard timers

The standard timer API has been part of the Linux kernel for quite some time (since the early versions of the Linux kernel). Although it offers less accuracy than high-resolution timers, it is ideal for traditional driver time-outs that provide coverage of error cases when dealing with physical devices. In many cases, these time-outs never actually fire but instead are started and subsequently removed.

Simple kernel timers are implemented using the timer wheel. The idea was originally introduced by Finn Arne Gangstad in 1997. It ignores the problem of managing a large number of timers but does a good job of managing a reasonable number of timers—the typical case. (The original timer implementation simply kept timers doubly linked in expiration order. Although conceptually simple, the approach was not scalable.) The timer wheel is a collection of buckets in which each bucket represents a chunk of time in the future for timer expiration. The buckets are defined using logarithmic time based on five buckets. Using jiffies as the granularity of time, a number of groups is defined that represents future periods of expiration (where each group is represented by a list of timers). Timer insertion occurs using list operations that are O(1) complexity, with expiration occurring in O(N) time. Expiration of timers occurs in a cascading operation, where timers are removed from higher-granularity buckets, and then inserted into lower-granularity buckets as their expiration time decreases. Let's now look at the API for this timer implementation.

The timer API

Linux provides a simple API for the construction and management of timers. It consists of functions (and helper functions) for timer creation, cancellation, and management.

Timers are defined by the timer_list structure, which includes all of the data necessary to implement a timer (including list pointers and optional timer statistics that are configured at compile time). From a user's perspective, timer_list contains an expiration time, a callback function (when/if the timer expires), and a user-provided context. The user must then initialize the timer, which he or she can do in a few ways. The simplest method is a call to setup_timer, which initializes the timer and sets the user-provided callback function and context. Otherwise, the user can set these values (function and data) in the timer and simply call init_timer. Note that init_timer is called internally by setup_timer"

void init_timer( struct timer_list *timer );
void setup_timer( struct timer_list *timer, 
                     void (*function)(unsigned long), unsigned long data );

With an initialized timer, the user now needs to set the expiration time, which is done through a call to mod_timer. As users commonly provide an expiration in the future, they typically add jiffies here to offset from the current time. Users can also delete a timer (if it has not expired) through a call to del_timer:

int mod_timer( struct timer_list *timer, unsigned long expires );
void del_timer( struct timer_list *timer );

Finally, users can determine whether the timer is pending (yet to fire) through a call to timer_pending (1 is returned if the timer is pending):

int timer_pending( const struct timer_list *timer );

Timer example

Let's look at some of these API functions in practice. Listing 1 provides a simple kernel module that demonstrates the core aspects of the simple timer API. Within init_module, you initialize a timer with setup_timer and then kick it off with a call to mod_timer. When the timer expires, the callback function (my_timer_callback) is invoked. Finally, the timer deletion (via del_timer) occurs when you remove the module. (Note the return check from del_timer, which identifies whether the timer is still in use.)

Listing 1. Exploring the simple timer API

	
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/timer.h>

MODULE_LICENSE("GPL");

static struct timer_list my_timer;

void my_timer_callback( unsigned long data )
{
  printk( "my_timer_callback called (%ld).\n", jiffies );
}

int init_module( void )
{
  int ret;

  printk("Timer module installing\n");

  // my_timer.function, my_timer.data
  setup_timer( &my_timer, my_timer_callback, 0 );

  printk( "Starting timer to fire in 200ms (%ld)\n", jiffies );
  ret = mod_timer( &my_timer, jiffies + msecs_to_jiffies(200) );
  if (ret) printk("Error in mod_timer\n");

  return 0;
}

void cleanup_module( void )
{
  int ret;

  ret = del_timer( &my_timer );
  if (ret) printk("The timer is still in use...\n");

  printk("Timer module uninstalling\n");

  return;
}

You can learn more about the timer API in ./include/linux/timer.h. Although the simple timer API is easy and efficient, it does not offer the accuracy required for real-time applications. For that, let's look at a recent addition to Linux that supports higher-resolution timers.

High-resolution timers

High-resolution timers (or hrtimers) provide a high-precision framework for timer management independent of the previously discussed timer framework because of complexities in merging the two frameworks. Although timers operate on the granularity of jiffies, hrtimers operate at the granularity of nanoseconds.

The hrtimer framework is implemented differently from the traditional timer API. Instead of buckets and timer cascading, hrtimers maintain a time-ordered data structure of timers (timers are inserted in time order to minimize processing at activation time). The data structure used is a red-black tree, which is ideal for performance-focused applications (and happens to be available generically as a library within the kernel).

The hrtimer framework is available as an API within the kernel and is also used by user space applications through nanosleep, itimers, and the Portable Operating System Interface (POSIX)-timers interface. It was mainlined into the 2.6.21 kernel.

High-resolution timer API

The hrtimer API has some similarities to the traditional API as well as some fundamental differences to account for the additional timing control. The first thing you'll notice is that time is represented not in jiffies but in a special data type called ktime. This representation hides some of the details of efficiently managing time at this granularity. The API formalizes the distinction between absolute and relative times, requiring the caller to specify the type.

Like the traditional timer API, timers are represented by a structure—in this case, hrtimer. This structure defines the timer from a user perspective (callback function, expiration time, and so on) and also incorporates the management information (where the timer exists in the red-black tree, optional statistics, and so on).

The process begins with the initialization of a timer through hrtimer_init. This call includes the timer, clock definition, and timer mode (one-shot or restart). The clock to use is defined in ./include/linux/time.h and represents the various clocks that the system supports (such as the real-time clock or a monotonic clock that simply represents time from a starting point, such as system boot). Once a timer has been initialized, it can be started with hrtimer_start. This call includes the expiration time (in ktime_t) and the mode of the time value (absolute or relative value).

void hrtimer_init( struct hrtimer *time, clockid_t which_clock, 
			enum hrtimer_mode mode );
int hrtimer_start(struct hrtimer *timer, ktime_t time, const 
			enum hrtimer_mode mode);

Once an hrtimer has started, it can be cancelled through a call to hrtimer_cancel or hrtimer_try_to_cancel. Each function includes the hrtimer reference as the timer to be stopped. These functions differ in that the hrtimer_cancel function attempts to cancel the timer, but if it has already fired, it will wait for the callback function to finish. The hrtimer_try_to_cancel function differs in that it also attempts to cancel the timer but will return failure if the timer has fired.

int hrtimer_cancel(struct hrtimer *timer);
int hrtimer_try_to_cancel(struct hrtimer *timer);

You can check to see if the hrtimer has activated its callback through a call to hrtimer_callback_running. Note that this function is called internally by hrtimer_try_to_cancel in order to return an error if the timer's callback function was called.

int hrtimer_callback_running(struct hrtimer *timer);

The ktime API

Not discussed here was the ktime API, which provides a rich set of functions for managing time at high resolution. You can see the ktime API in ./linux/include/ktime.h.

An hrtimer example

Use of the hrtimer API is quite simple, as shown in Listing 2. Within init_module, you start by defining your relative time to time-out (in this case, 200ms). You initialize your hrtimer with a call to hrtimer_init (using the monotonic clock), and then set the callback function. Finally, you start the timer using your previously created ktime value. When the timer fires, the my_hrtimer_callback function is called, which returns HRTIMER_NORESTART so that the timer is not automatically restarted. In the cleanup_module function, you clean up by cancelling the timer with hrtimer_cancel.

Listing 2. Exploring the hrtimer API

	
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/ktime.h>

MODULE_LICENSE("GPL");

#define MS_TO_NS(x)	(x * 1E6L)

static struct hrtimer hr_timer;

enum hrtimer_restart my_hrtimer_callback( struct hrtimer *timer )
{
  printk( "my_hrtimer_callback called (%ld).\n", jiffies );

  return HRTIMER_NORESTART;
}

int init_module( void )
{
  ktime_t ktime;
  unsigned long delay_in_ms = 200L;

  printk("HR Timer module installing\n");

  ktime = ktime_set( 0, MS_TO_NS(delay_in_ms) );

  hrtimer_init( &hr_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL );
  
  hr_timer.function = &my_hrtimer_callback;

  printk( "Starting timer to fire in %ldms (%ld)\n", delay_in_ms, jiffies );

  hrtimer_start( &hr_timer, ktime, HRTIMER_MODE_REL );

  return 0;
}

void cleanup_module( void )
{
  int ret;

  ret = hrtimer_cancel( &hr_timer );
  if (ret) printk("The timer was still in use...\n");

  printk("HR Timer module uninstalling\n");

  return;
}

There's much more to the hrtimer API than has been touched on here. One interesting aspect is the ability to define the execution context of the callback function (such as in softirq or hardiirq context). You can learn more about the hrtimer API from the include file in ./include/linux/hrtimer.h.

Kernel lists

As mentioned early in this article, lists are useful structures, and the kernel provides an efficient implementation for generic use. Additionally, you'll find lists underneath the APIs that we've explored thus far. Understanding the doubly linked list API will help you develop with this efficient data structure as well as understand the plethora of code in the kernel that utilizes lists. Let's now take a quick tour of the kernel list API.

The API provides a list_head structure that is used to represent not only the list head (anchor) but also the in-structure list pointers. Let's look at a sample structure that includes list capabilities (see Listing 3). Note the addition of the list_head structure, which is used for object linkage. And note that you can add the list_head structure anywhere in your structure, and—though some GCC magic (list_entry and container_of, defined in ./include/kernel/kernel.h)—you can dereference from the list pointer to the super-object.

Listing 3. Sample structure with list references

	
struct my_data_structure {
	int value;
	struct list_head list;
};

Like any list implementation, you need a list head that serves as the anchor for the list. This is commonly done with the LIST_HEAD macro, which provides the declaration and initialization of the list. This macro creates a struct list_head object onto which you can add objects.

LIST_HEAD( new_list )

You can also create a list head manually (for example, if your list head is in another structure) through the use of the LIST_HEAD_INIT macro.

With the primary initialization complete, you can manipulate the list using the list_add and list_del functions (in addition to many more). Now, let's jump into example code, which better serves to illustrate the use of the API.

List API example

Listing 4 provides a simple kernel module that explores a number of the list API functions (though many more can be found in ./include/linux/list.h). This example creates two lists, populates them in the init_module function, and then manipulates the lists in the cleanup_module function.

At the start, you create your data structure (my_data_struct), which includes some data and then two list heads. This example demonstrates that you can insert an object into multiple lists at the same time. You then create two list heads (my_full_list and my_odd_list).

In the init_module function, you create 10 data objects and load them onto the lists (all objects on the my_full_list, and all odd-valued objects onto the my_odd_list) using the list_add function. Note here that list_add takes two arguments, the first being the list reference within the object to be used and the second being the list anchor. This demonstrates the ability of a data object to be on multiple lists, using the kernel's internal tricks for identifying the super-object that contains the list reference.

The cleanup_module function illustrates a few more capabilities of the list API, the first of which is the list_for_each macro, which simplifies list iteration. For this macro, you provide a reference to the current object (pos) and the list reference to be iterated. For each iteration, you receive a list_head reference, which can be provided to list_entry to identify the container object (your data structure). Specify your structure and the list variable within your structure, which is used internally to dereference back to the container.

To emit the odd list, you use another iteration macro called list_for_each_entry. This macro is simpler in that it automatically provides your data structure, obviating the need to perform a list_entry.

Finally, you use list_for_each_safe to iterate the list for purposes of freeing the allocated elements. This macro permits iteration over the list with safeguards against removal of a list entry (which you'll do as part of the iteration). You use list_entry to get at your data object (to free it back to the kernel pool), and then use list_del to free the entry on the list.

Listing 4. Exploring the list API

	
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/list.h>

MODULE_LICENSE("GPL");

struct my_data_struct {
  int value;
  struct list_head full_list;
  struct list_head odd_list;
};

LIST_HEAD( my_full_list );
LIST_HEAD( my_odd_list );


int init_module( void )
{
  int count;
  struct my_data_struct *obj;

  for (count = 1 ; count < 11 ; count++) {

    obj = (struct my_data_struct *)
            kmalloc( sizeof(struct my_data_struct), GFP_KERNEL );

    obj->value = count;

    list_add( &obj->full_list, &my_full_list );

    if (obj->value & 0x1) {
      list_add( &obj->odd_list, &my_odd_list );
    }

  }

  return 0;
}


void cleanup_module( void )
{
  struct list_head *pos, *q;
  struct my_data_struct *my_obj;

  printk("Emit full list\n");
  list_for_each( pos, &my_full_list ) {
    my_obj = list_entry( pos, struct my_data_struct, full_list );
    printk( "%d\n", my_obj->value );
  }

  printk("Emit odd list\n");
  list_for_each_entry( my_obj, &my_odd_list, odd_list ) {
    printk( "%d\n", my_obj->value );
  }

  printk("Cleaning up\n");
  list_for_each_safe( pos, q, &my_full_list ) {
    struct my_data_struct *tmp;
    tmp = list_entry( pos, struct my_data_struct, full_list );
    list_del( pos );
    kfree( tmp );
  }

  return;
}

Numerous other functions exist for adding to the tail of a list instead of the head (list_add_tail), joining lists (list_splice), and testing the contents of a list (list_empty). See Resources for more details on kernel list functions.

Going further

This article explored a few APIs that demonstrate the ability to segregate functionality where necessary (timer and high-precision hrtimer APIs) as well as commonize code for code reuse (list API). Traditional timers provide an efficient mechanism for typical driver time-outs, where hrtimers provide another level of quality of service for more precise timer capabilities. The list API provides a very generic but efficient and richly functional interface. If you write kernel code, you'll run across one or all three of these APIs, so they're definitely worth exploring.

Resources

Learn

Learn about other work deferral schemes (tasklets and work queues) in "Kernel APIs, Part 2: Deferrable functions, kernel tasklets, and work queues" (developerWorks, March 2010).
For more information on the traditional timer implementation, including details on the cascading timer wheel, check out this Linux kernel mailing list post from Ingo Molnar. It provides a detailed view of the timer API, its rationale, and considerable details of its implementation.
Thomas Gleixner provides a great introduction to the timer subsystem and the split of timers and high-resolution timers in this Linux kernel mailing list post of the ktimers patch. It also provides details on the use of timers by higher-level subsystems.
For a detailed look at the changes to the Linux time system (which provides the basis for timers in the kernel), check out the paper Hrtimers and Beyond: Transforming the Linux Time Subsystems (PDF). This paper details the design of the overall time subsystem and the changes required to support multiple timer APIs.
The Linux list API uses GCC extensions to identify the container of an object that includes a list pointer structure (using typeof and offsetof). To learn about other GCC extensions, check out "GCC hacks in the Linux kernel" (developerworks, November 2008).
The FAQ/LinkedLists at KernelNewbies provides a good introduction to lists and some of the internals.
In the developerWorks Linux zone, find hundreds of how-to articles and tutorials, as well as downloads, discussion forums, and a wealth other resources for Linux developers and administrators.
Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Follow developerWorks on Twitter.

Get products and technologies

Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

M. Tim Jones is an embedded firmware architect and the author of Artificial Intelligence: A Systems Approach, GNU/Linux Application Programming (now in its second edition), AI Application Programming (in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado.