IntroductionIn this article, I am going to describe some general features and some specific ones of the memory management in Linux. It will be mainly on dynamic memory allocation and release, as well as the management of the free memory. The article concerns the Linux kernel versions 2.6.X. Structure of the Linux Memory ManagementThe term “memory management” refers to the mechanisms implemented by an operating system to provide applications with memory-related services. These services include usage of virtual memory (utilizing of a hard disk or other non-RAM storage media to provide additional program memory), protected memory (exclusive access to a region of memory by a process), and shared memory (cooperative access to a region of memory by multiple processes). Memory management services in Linux are built on a programming foundation that includes a peripheral device called Memory Management Unit (MMU). MMU translates physical memory addresses to linear addresses used by the operating system, and requests a page fault interrupt, when the CPU tries to access memory that it is not entitled to. Not all processors have MMUs. Therefore, the uClinux distribution (Linux for microcontrollers) supports a single address space of operation. This architecture lacks the protection provided by MMU but makes it possible for Linux to run on another class of processors. For further understanding of the structure of the MM services, we need to know that a basic unit of memory under Linux is page, a non-overlapping region of contiguous memory. All available physical memory is organized into pages towards the end of the kernel’s boot process. Size of page depends on processor architecture. Processor designs often allow to have two or more, sometimes simultaneously, page sizes. The traditional page size used by Linux is 4096 bytes. But using memory pages “as is“ is not very convenient. Often, we need to allocate less than one memory page. There are such possibilities in Linux:
To provide a simple interface for interaction with Memory Management Unit and perform such interaction in a portable way, in Linux, subsystem of allocating and releasing memory is split into three layers. These layers are:
The general scheme of all these layers interaction with user mode code and hardware looks as follows: Figure 1: General scheme of the memory management in Linux
Note that in Linux, most of programs directly or indirectly use heap manager of the GCC Standard C Library called As we can see in Figure 1, user space allocation always leads to kernel allocation. Kernel allocates memory using the chain of three kernel allocators and maps allocated pages to the address space of the process, which has requested the allocation. Kernel Mode Memory Management ServicesThe Buddy Allocator is responsible for the management of the page allocations in the entire system. This code manages lists of physically contiguous pages and maps them into the MMU page tables to provide other kernel subsystems with the valid physical address ranges, when the kernel requests them (Physical to Virtual Address mapping is handled by a higher layer of the VM). The Buddy Allocator splits memory into pairs of 2n pages where n is in range from Figure 2: Array of lists of memory pages in the Buddy Allocator
Each list consists of free physically contiguous blocks of 2i memory pages, where i is a list number. Each of such blocks, except the block that consists of 1 page, can be split into two halves and used as 2 blocks of a half size. So if no entries exist in the requested list, an entry from the next upper list is broken into two separate clusters and one is returned to the caller while the other one is added to the next lower list. On the other hand, every two blocks of memory of the same size, which have common border (arranged in memory sequentially, from the standpoint of physical addresses), may be united into the single block of the bigger size. Such neighboring blocks are called Buddies. When allocation is returned to the Buddy Allocator, it checks if buddy of the allocation is free, and if it is so, Buddy Allocator unites them into the bigger block. This operation is repeated until no more block buddies are found. Also we should note that the Buddy Allocator can allocate only blocks of the size in pages that is equal to 2 raised to some power. The Buddy Allocator also interacts with the kernel threads Different ranges of physical pages may have different properties, for the purposes of the kernel. For example, Direct Memory Access can work only in specific range of physical addresses in the x86 architecture. On the other hand, PPC does not have this constraint. For handling such situation in a hardware, independent way the Zone Allocator was created. The Zone Allocator is used to allocate pages in the specified zone. Today Linux kernel is supporting three memory zones:
Note that the Zone Allocator also can manipulate only with memory pages. Since we often need to allocate objects that have size less than the size of a page, we need something to deal with the pages and allocate lesser chunks of memory for us. We know the sizes of the most objects that are often allocated in the kernel space, so we can create allocator that will receive pages of memory from the Zone Allocator and allocate small objects in these memory pages. This subsystem is named the Slab Allocator (An Object-Caching Kernel Memory Allocator). The Slab Allocator organizes memory in caches, one cache for each object type, e.g. This means that the constructor of the objects is used only for newly allocated slabs and you should initialize object before releasing it to the Slab Allocator. Also the Slab Allocator makes it possible to allocate buffers of memory of one of the specially defined sizes. Such buffers can be got using kernel function Also the kernel can allocate virtually contiguous memory (memory with contiguous virtual addresses, but not with contiguous physical addresses) using Special Aspects of the Linux Kernel Mechanisms of the Memory ManagementWhy Linux Developers Rarely Check Results of Memory AllocationThe short answer is: “Because of overcommit”. Under the default memory management strategy, This approach leads to some performance gains for applications, which allocates a lot of memory but uses only some part of it, since allocation of the swap pages will not be performed before the swap pages is used. But sometimes, such an approach is not good because of the chance to be killed by OOM killer at any moment. Fortunately, Linux allows to change default approach of the memory allocation and behavior of the system when the OOM event occurs. The overcommit policy is set via the sysctl The Linux kernel supports the following overcommit handling modes associated with the values of the kernel parameter 0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. This type of handling is used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. Root is allowed to allocate a bit more memory in this mode. This type is set by default. 1 - Always overcommit. Appropriate for some scientific applications, which use a lot of memory. 2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (50% by default) of physical RAM. Depending on the percentage you use, in most situations, this means that a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate. The overcommit percentage is set via the sysctrl The current overcommit limit and amount committed can be viewed in /proc/meminfo as How Out-of-memory Killer Mechanism WorksThere are three strategies to handle Out-of-memory situation:
The first and the third strategies are fairly simple unlike the second strategy. Let’s consider it in more detail. According to the second strategy, Linux will try to choose appropriate process to kill when it runs out of memory. “Appropriate” in this context means that:
In order to choose a process to kill, OOM killer calculates the value named Badness. Then it selects the process with the maximum Badness to be killed. If the allocating process was chosen, OOM terminates its work. If some other process was chosen, OOM killer can be called more than once in case the previous run of the OOM killer did not free enough memory. Badness is calculated according to the next rules:
We can write badness of the process A as the equation:
Where:
You can see the mechanism of the out-of-memory handler in the source file “/mm/oom_kill.c” in the Linux kernel sources tree. You can tune OOM handling mechanism by setting values to such kernel parameters: vm.oom_dump_tasks
This parameter enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such information as If it is set to zero, this information is suppressed. If it is set to non-zero, this information is shown whenever the OOM killer actually kills a memory-hogging task. By default vm.oom_kill_allocating_task
This parameter enables or disables killing the OOM-triggered task in out-of-memory situations. If this parameter is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. It helps to avoid the resource-expensive tasklist scan and makes OOM killer very predictable. If it is set to zero, the OOM killer will scan through the entire tasklist and select a task on the basis of the badness calculation. By default vm.panic_on_oom
If this kernel parameter is set to If this parameter is set to If this parameter is set to By default Note that Additional Links and Literature
History
|
|