分享

Pick up performance with generational garbage collection

 novo_land 2011-10-10

Pick up performance with generational garbage collection

Use the appropriate Java HotSpot VM 1.3.1 parameters to improve throughput

  • Print
  • Feedback

Garbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput.

Sun Microsystems introduced generational GC in the Java HotSpot VM for Solaris. Generational GC separates older and newer objects into separate heap spaces. With command line parameters, you control how the HotSpot JVM uses that heap space to perform GC. HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint. But you can select parameters that activate the Java HotSpot Server VM to improve the throughput of large, server-side applications, like those running under BEA's WebLogic, by 20 percent or more.

This article is written from the perspective of the infrastructure architect, not the Java developer. I don't explain how to modify Java code to achieve better GC. Instead, I show how the HotSpot JVM uses the system resources allocated to it to provide significant throughput improvement with no code modifications.

Pre-HotSpot JVMs

Prior to HotSpot, most JVMs had three main GC problems. First, allobjects were scanned during every GC. As the number of objects increased, this type of GC's performance time increased as well.

Second, partially accurate GC algorithms were conservative when reclaiming memory. These algorithms had difficulty differentiating between pointers and other data types. This often meant the algorithm would fail to collect all garbage for fear of eliminating valid data objects.

Third, a garbage collector used handles to refer indirectly to objects in memory. Those handles were thought to expedite and simplify object relocation during garbage collection; however, they proved to be a significant performance bottleneck. The inability to relocate objects caused significant memory fragmentation and prevented the use of more sophisticated GC algorithms. Other collectors used handleless objects, but when relocated objects were collected, all other objects had to be scanned so that pointers to relocated objects could be updated.

Post-HotSpot JVM

The Exact VM (JVM 1.2.2) introduced exact garbage collection. Sun then improved the exact GC design in JVM 1.3 and renamed it generational GC. Java HotSpot VM 1.3.1's GC is fully accurate, guaranteeing that:

  • You can reliably reclaim all inaccessible objects' memory
  • You can relocate all objects to compact memory, eliminating object memory fragmentation


The HotSpot JVM uses a two-machine-word object header, rather than the three-word header found in most other JVMs. This saves as much as 10 percent of the heap size for typical applications while accelerating the code to scan all objects.

The HotSpot JVM also eliminates the concept of handles. This reduces memory usage and speeds processing. In the HotSpot JVM, object references are implemented as direct pointers, providing C-speed access to instance variables.

Three types of collection algorithms

The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. Thecopy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. Theincremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.

Copy/scavenge collection

Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, ortenured, into the old object space.

Mark-compact collection

As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.

The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.

Incremental (train) collection

The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.

Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.

The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.

Performance factors

JVM performance is usually measured by its GC's effectiveness. "Tuning Garbage Collection with the 1.3.1 Java Virtual Machine" covers performance considerations in more depth. I will cover those factors that concern this article.

A JVM's throughput accounts for the percentage of total time GC does not take place. Therefore, 80 percent throughput implies that garbage collection consumes 20 percent of the JVM's processing while your application consumes only 80 percent. Throughput is also measured in pauses, during which your application stops processing while the JVM collects garbage.

Footprint accounts for the JVM's required amount of memory. On computers with limited memory, a large footprint can increase swapping and paging, where the operating system (OS) struggles to find free memory pages for the JVM to use. As OS paging increases, it consumes more processors and likely decreases the JVM's overall performance.

Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact. If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory.

Heap layout

The HotSpot JVM manages heap space in generations -- that is, memory pools for both new and old objects. As these objects accumulate, eventually a low memory condition occurs, forcing garbage collection to take place. Figure 1 illustrates the heap space divided into the old and the new generation.

Figure 1. Heap broken into its components



The new generation includes the new object space (eden), plus two survivor spaces (SS#1 and SS#2), as Figure 1 shows. New objects allocate in eden. Longer-lived objects are moved from the new generation and tenured to the old generation.

Figure 1 shows another heap section, called the permanent generation, which holds the JVM's class and method objects. The -XX:MaxPermSize=64m command line parameter controls the permanent generation's size. I won't discuss the permanent generation further in this article.

Control the heap size

You can control the heap size using several parameters. The -Xms and -Xmx parameters define the minimum and maximum heap sizes, respectively. Most large, server-side applications set the values equal to each other for a fixed heap size.

If you set those parameters unequal, then the JVM must increase or decrease the heap size at each collection; the objective is to keep the living object space's proportion within a specific range. The -Xminf and -Xmaxf parameters define the total heap size's minimum proportion and the maximum proportion, respectively.

If you use expandable heaps, you should bear in mind the impact of changing the old and new generation heap sizes. When the heap grows or shrinks, the JVM must recalculate the old and new generation sizes in order to maintain a predefined ratio (the NewRatio parameter).

The NewSize and MaxNewSize parameters control the new generation's minimum and maximum size, respectively. You can regulate the new generation size by setting these parameters equal. You can gain a fine granularity when using these parameters to tune the new generation.

Garbage collections

When the new generation fills up, it triggers a minor collection, in which surviving objects are moved to the old generation. When the old generation fills up, it triggers a major collection, which involves the entire object heap.

Minor collections

The Java HotSpot VM 1.3.1 uses copying collection for all minor collections. Figure 2's top portion shows that newly allocated objects (the blank circles) exist in eden. During a minor collection, the living objects (the dark circles) in eden are copied to the first survivor space. Once the copy is complete, you can use the entire eden space.

Figure 2. Minor collections. Click on thumbnail to view full-size image.

During the next GC, the living objects from eden and from the first survivor space are copied to the second survivor space. This is illustrated in Figure 2's middle portion, where all the living objects are copied, thus leaving only newly allocated objects in eden and the first survivor space.

The minor collection copies objects between survivor spaces until they become tenured; those objects are then copied to the old generation, as Figure 2's bottom portion shows.

Major collections

The Java HotSpot VM 1.3.1 uses mark-compact collection for all major collections; therefore, major collections occur in the old object space. Figure 3 illustrates the two-step process that comprises the mark-compact algorithm. During the first step, garbage collection goes through the entire heap, marking all unreachable objects (the red circles). During the second step, the unreachable objects (red circles) are compacted, leaving only live objects (the gray circles).

Figure 3. Major collections



Ratio of old to new generations

So far, my diagrams have casually drawn a line to separate the old and the new generations. The actual placement of the dividing line between the old and new generations is the most critical decision influencing HotSpot JVM performance. Every time you start the HotSpot JVM, you determine where to place this line by including or omitting one parameter.

NewRatio

You can divide the heap into old and new generations using the NewRatio parameter. If you use -XX:NewRatio=5, then you create an old-to-new ratio of 5:1; the old generation occupies 5/6 of the heap while the new generation occupies 1/6 of the heap. If you increase the new generation's size, minor collections may occur less often. However, because the -Xmxparameter sets the total heap size, you also decrease the old generation's size. This may increase the frequency of major collections.

Java HotSpot Client VM ratio

The Java HotSpot Client VM 1.3.1 replaces both the classic JVM and the JVM 1.2 just-in-time (JIT) compilers to improve runtime performance for applications and applets. The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default.

The default NewRatio for the HotSpot Client JVM is 8; the old generation occupies 8/9 of the heap while the new generation occupies 1/9, as Figure 4 shows. This allocation is appropriate for client applications, like Java GUIs, that allocate many short-lived objects. Objects created to support a GUI window often do not live beyond the window display's life. After a few minor collections for longer-running applications where objects live longer, tenure occurs and the objects move to the old generation. Once this happens, every subsequent collection is commonly a major one.

Figure 4. Impact of NewRatio on generation sizes



Java HotSpot Server VM ratio

The Java HotSpot Server VM 1.3.1 is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the -server parameter; for example, java -server MyApp.

The default NewRatio for the HotSpot Server JVM is 2; the old generation occupies 2/3 of the heap while the new generation occupies 1/3, as Figure 4 above shows. The larger new generation can accommodate many more short-lived objects, thus decreasing the need for slow major collections. The old generation is still sufficiently large enough to hold many long-lived objects.

Client JVM vs. Server JVM: Which is right for you?

There is no simple answer to the question of which HotSpot JVM is right for your application. Just because an application is long running doesn't mean it doesn't allocate many short-lived objects. Also, just because an application is a GUI doesn't mean it only allocates short-lived objects. Only you understand how your application creates and destroys objects.

The "Capture GC Statistics" section below explains how to determine GC behavior within your application. Poorly selected JVM parameters can severely degrade your application performance. For applications running under the WebLogic framework, I have often seen 20 to 30 percent performance improvement simply by adding the -server parameter, thereby selecting the HotSpot Server JVM over the default HotSpot Client JVM.

In general, if your Java application is a standalone program, the HotSpot Client JVM will probably give you the best performance. If your Java application executes within a server framework, such as BEA WebLogic, the HotSpot Server JVM will probably give you better performance. Try each and see what works best for your application.

Note: When no client or server parameter is provided, the Java HotSpot VM 1.3.1 uses its default value. The default is the first line in the jvm.cfg file, which is located in the <jvm_dir>/jre/lib directory. Rather than modifying all your startup scripts to add the -server parameter, you can make -server the first noncomment line in the file.

SurvivorRatio

The SurvivorRatio parameter controls the size of the two survivor spaces. For example, if you set the parameter to -XX:SurvivorRatio=10, the ratio between each survivor space and eden is 1:10. Since two survivor spaces exist, each survivor space will be 1/12 of the new generation.

While SurvivorRatio is generally less important to performance, you should know that the default setting is 25. Figure 5 shows a drawn-to-scale default HotSpot Server JVM heap with a NewRatio of 2 (i.e., the old generation is twice the size of the new generation) and a SurvivorRatio of 25 (i.e., a survivor space is 1/25 the size of eden).

Figure 5. SurvivorRatio default setting of 25



Word to the wise

To improve performance, it's important to keep eden smaller than half the heap size. This ensures that you have enough memory available to complete a minor collection. When you lack enough memory, a major collection will occur, which will bog down performance.

This means that the old generation must typically be larger than the new generation. One reason for this is that the HotSpot JVM guarantees that if everything is alive in eden, it can all be copied to the old space, so that every collection triggers a full GC. An exception is if you use the -XX:MaxLiveObjectEvacuationRatio=&ltn> parameter. This ratio lets you declare that you have short-lived objects and that the HotSpot JVM needn't worry about having a large enough old space. This parameter can be 0 if you want almost no old generation; the default value is 100, meaning that 100 percent of eden may contain live objects.

This gives you an idea of the level of control you can have over the way in which the HotSpot JVM uses the heap space you allocate to it. It should also show the level of understanding you should possess before using -XX parameters in production environments. For example, if you set the -XX:MaxLiveObjectEvacuationRatio=&ltn> parameter too low, you will continually get an out-of-memory error, so use it with extreme caution. (See "Sidebar 1: Support for -XX Parameters" at the end of this article.)

Analyze GC behavior

To determine which HotSpot JVM parameters are best for your application, you may need to ask the JVM to display information about its GC behavior.

Capture GC statistics

Every time the JVM performs a collection, the command line parameter -verbosegc instructs it to output the heap data. In the following sample output, there are four minor collections, one major collection, and then three more minor collections. The numbers before and after the arrow indicate the size of the live objects before and after the GC. The number in parentheses is the total heap size. In the first GC, 40,549 KB of objects existed before collection and 20,909 KB of objects after collection. This means that 19,640 KB of objects were dead and collected. The total heap size is 64,768 KB. The collection process required 0.0484179 seconds:

[GC 40549K->20909K(64768K), 0.0484179 secs]
[GC 41197K->21405K(64768K), 0.0411095 secs]
[GC 41693K->22995K(64768K), 0.0846190 secs]
[GC 43283K->23672K(64768K), 0.0492838 secs]
[Full GC 43960K->1749K(64768K), 0.1452965 secs]
[GC 22037K->2810K(64768K), 0.0310949 secs]
[GC 23098K->3657K(64768K), 0.0469624 secs]
[GC 23945K->4847K(64768K), 0.0580108 secs]


An awk script to analyze the GC data

The following awk script parses the output from -verbosegc on a Solaris computer, creating a file suitable for importing into Excel for graphing. (Why awk, you ask? As an infrastructure architect, not a Java developer, I felt more comfortable usingawk.)

BEGIN {
  printf("Minor\tMajor\tAlive\tFreed\n")
}
{
  if ( substr(body,1,4) == "[GC " )  {
    # break each input line into 4 pieces in array[]
    split(body,array," ");
    # array[1]="[GC"
    # array[2]="20713K->549K(64768K),"
    # array[3]="0.0086130"
    # array[4]="secs]"
    printf("%s\t0.0\t",array[3])
    # break array[2]="43960K->1749K(64768K)," into 4 pieces in barray[]
    split(array[2],barray,"K")
    # barray[1]="43960"
    # barray[2]="->1749"
    # barray[3]="(64768"
    # barray[4]="),"
    before=barray[1]
    after=substr(barray[2],3)
    reclaim=before-after
    printf("%s\t%s\n",after,reclaim)
  }
  if ( substr(body,1,9) == "[Full GC " )  {
    # break each input line into 4 pieces in array[]
    split(body,array," ");
    # array[1]="[Full"
    # array[2]="GC"
    # array[3]="20713K->549K(64768K),"
    # array[4]="0.0086130"
    # array[5]="secs]"
    printf("0.0\t%s\t",array[4]) 
    # break array[2]="43960K->1749K(64768K)," into 4 pieces in barray[]
    split(array[3],barray,"K")
    # barray[1]="43960"
    # barray[2]="->1749"
    # barray[3]="(64768"
    # barray[4]="),"
    before=barray[1]
    after=substr(barray[2],3)
    reclaim=before-after
    printf("%s\t%s\n",after,reclaim)
  }
  # no idea what this line is so skip it
  next;
}


Here is sample output from the above awk script:

Minor       Major       Alive       Freed
0.0484179   0.0         20909       19640
0.0411095   0.0         21405       19792
0.0846190   0.0         22995       18698
0.0492838   0.0         23672       19611
0.0         0.1452965   1749        42211
0.0310949   0.0         2810        19227
0.0469624   0.0         3657        19441
0.0580108   0.0         4847        19098


I imported the awk script's output into MS Excel. Figure 6 illustrates a graph created from the two lefthand columns. It shows the frequency of minor and major collections plus the amount of time required to perform the GC (i.e., the pauses).

Figure 6. Sample graph of minor/major collections



Figure 7 illustrates a graph created from the two righthand columns. It shows the amount of memory live objects consumed following each GC, as well as the amount of memory freed when dead objects were reclaimed.

Figure 7. Sample graph of GC behavior



Tenure distribution

After objects have been collected several times, they become tenured and are promoted to the old generation. You can use the -XX:+PrintTenuringDistribution parameter to display the information the JVM uses during its tenure calculations.

The JVM attempts to keep a target percentage of the survivor spaces empty, as defined by the -XX:TargetSurvivorRatioparameter. The default value is 50, meaning that the goal is to empty one-half (50 percent) of the survivor capacity following GC. The JVM computes the desired survivor size as shown below, and then computes the object threshold. An object'sthreshold accounts for the number of times an object is copied before it is tenured; this is recalculated at each collection:

    (survivor_capacity * TargetSurvivorRatio) / 100 * sizeof(a pointer)


Below is sample output when running java -verbosegc -XX:+PrintTenuringDistribution MyApp. In the first GC, the desired survivor size is 393,216 bytes (384 KB) and the threshold is 1, meaning that objects will be tenured following one collection. Then the garbage collector goes through the ages, the number of times the collector decides not to promote:

[GC
Desired survivor size 393216 bytes, new threshold 1 (max 32)
- age   1:   509624 bytes,   509624 total
 20288K->497K(64768K), 0.0147963 secs]
[GC
Desired survivor size 393216 bytes, new threshold 32 (max 32)
- age   1:   169616 bytes,   169616 total
 27697K->6997K(64768K), 0.0038858 secs]
[GC
Desired survivor size 393216 bytes, new threshold 32 (max 32)
- age   1:   191392 bytes,   191392 total
- age   2:    52944 bytes,   244336 total
 27285K->7070K(64768K), 0.0046738 secs]
[GC
Desired survivor size 393216 bytes, new threshold 1 (max 32)
- age   1:   733488 bytes,   733488 total
- age   3:    52944 bytes,   786432 total
 27358K->7662K(64768K), 0.0148100 secs]


In the first GC above, objects occupying 509,624 bytes have already been copied once, meaning that they are in the first age bucket (age 1). In the third GC, there are objects occupying 191,392 bytes in the first age bucket and objects occupying 52,944 bytes in the second age bucket. Since the threshold is 32 during the third GC, you see that the objects were not tenured, because in the fourth GC there are objects occupying 52,944 bytes in the third age bucket.

(See "Sidebar 2: Performance Test Program" at the end of this article for information on Ed Ort's HeapTest program used to test parameters in this article.)

Take out the trash

Ordinarily, the Java developer doesn't have to be concerned with the complexity of memory allocation and GC within the JVM. However, understanding aspects of this hidden implementation can help you ensure effective resource use. Garbage collection algorithms make assumptions about the way applications use objects. The HotSpot JVM's tunable parameters let you adjust the GC algorithms to better meet your application's behavior characteristics. Sometimes just adding the -serverparameter to switch from the default HotSpot Client JVM to the HotSpot Server JVM can improve throughput of WebLogic applications by 20 percent or more.

Stay tuned for the Java HotSpot VM 1.4 release, which will extend the train, or incremental, garbage collector algorithm to perform GC in parallel.


Resources

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多