Pick up performance with generational garbage collectionUse the appropriate Java HotSpot VM 1.3.1 parameters to improve throughputGarbage collection (GC) reclaims the heap space previously allocated to objects no longer needed. The process of locating and removing those dead objects can stall your Java application while consuming as much as 25 percent of throughput. Sun Microsystems introduced generational GC in the Java HotSpot VM for Solaris. Generational GC separates older and newer objects into separate heap spaces. With command line parameters, you control how the HotSpot JVM uses that heap space to perform GC. HotSpot's default parameters are effective for most small applications that require faster startup and a smaller footprint. But you can select parameters that activate the Java HotSpot Server VM to improve the throughput of large, server-side applications, like those running under BEA's WebLogic, by 20 percent or more. This article is written from the perspective of the infrastructure architect, not the Java developer. I don't explain how to modify Java code to achieve better GC. Instead, I show how the HotSpot JVM uses the system resources allocated to it to provide significant throughput improvement with no code modifications. Pre-HotSpot JVMsPrior to HotSpot, most JVMs had three main GC problems. First, allobjects were scanned during every GC. As the number of objects increased, this type of GC's performance time increased as well. Second, partially accurate GC algorithms were conservative when reclaiming memory. These algorithms had difficulty differentiating between pointers and other data types. This often meant the algorithm would fail to collect all garbage for fear of eliminating valid data objects. Third, a garbage collector used handles to refer indirectly to objects in memory. Those handles were thought to expedite and simplify object relocation during garbage collection; however, they proved to be a significant performance bottleneck. The inability to relocate objects caused significant memory fragmentation and prevented the use of more sophisticated GC algorithms. Other collectors used handleless objects, but when relocated objects were collected, all other objects had to be scanned so that pointers to relocated objects could be updated. Post-HotSpot JVMThe Exact VM (JVM 1.2.2) introduced exact garbage collection. Sun then improved the exact GC design in JVM 1.3 and renamed it generational GC. Java HotSpot VM 1.3.1's GC is fully accurate, guaranteeing that:
The HotSpot JVM uses a two-machine-word object header, rather than the three-word header found in most other JVMs. This saves as much as 10 percent of the heap size for typical applications while accelerating the code to scan all objects. The HotSpot JVM also eliminates the concept of handles. This reduces memory usage and speeds processing. In the HotSpot JVM, object references are implemented as direct pointers, providing C-speed access to instance variables. Three types of collection algorithmsThe HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. Thecopy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. Theincremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses. Copy/scavenge collectionUsing the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, ortenured, into the old object space. Mark-compact collectionAs more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space. The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation. Incremental (train) collectionThe new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection. Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation. The Performance factorsJVM performance is usually measured by its GC's effectiveness. "Tuning Garbage Collection with the 1.3.1 Java Virtual Machine" covers performance considerations in more depth. I will cover those factors that concern this article. A JVM's throughput accounts for the percentage of total time GC does not take place. Therefore, 80 percent throughput implies that garbage collection consumes 20 percent of the JVM's processing while your application consumes only 80 percent. Throughput is also measured in pauses, during which your application stops processing while the JVM collects garbage. Footprint accounts for the JVM's required amount of memory. On computers with limited memory, a large footprint can increase swapping and paging, where the operating system (OS) struggles to find free memory pages for the JVM to use. As OS paging increases, it consumes more processors and likely decreases the JVM's overall performance. Command line parameters that divide the heap between new and old generations usually cause the greatest performance impact. If you increase the new generation's size, you often improve the overall throughput; however, you also increase footprint, which may slow down servers with limited memory. Heap layoutThe HotSpot JVM manages heap space in generations -- that is, memory pools for both new and old objects. As these objects accumulate, eventually a low memory condition occurs, forcing garbage collection to take place. Figure 1 illustrates the heap space divided into the old and the new generation. Figure 1. Heap broken into its components The new generation includes the new object space (eden), plus two survivor spaces (SS#1 and SS#2), as Figure 1 shows. New objects allocate in eden. Longer-lived objects are moved from the new generation and tenured to the old generation. Figure 1 shows another heap section, called the permanent generation, which holds the JVM's class and method objects. The Control the heap sizeYou can control the heap size using several parameters. The If you set those parameters unequal, then the JVM must increase or decrease the heap size at each collection; the objective is to keep the living object space's proportion within a specific range. The If you use expandable heaps, you should bear in mind the impact of changing the old and new generation heap sizes. When the heap grows or shrinks, the JVM must recalculate the old and new generation sizes in order to maintain a predefined ratio (the The Garbage collectionsWhen the new generation fills up, it triggers a minor collection, in which surviving objects are moved to the old generation. When the old generation fills up, it triggers a major collection, which involves the entire object heap. Minor collectionsThe Java HotSpot VM 1.3.1 uses copying collection for all minor collections. Figure 2's top portion shows that newly allocated objects (the blank circles) exist in eden. During a minor collection, the living objects (the dark circles) in eden are copied to the first survivor space. Once the copy is complete, you can use the entire eden space. Figure 2. Minor collections. Click on thumbnail to view full-size image. During the next GC, the living objects from eden and from the first survivor space are copied to the second survivor space. This is illustrated in Figure 2's middle portion, where all the living objects are copied, thus leaving only newly allocated objects in eden and the first survivor space. The minor collection copies objects between survivor spaces until they become tenured; those objects are then copied to the old generation, as Figure 2's bottom portion shows. Major collectionsThe Java HotSpot VM 1.3.1 uses mark-compact collection for all major collections; therefore, major collections occur in the old object space. Figure 3 illustrates the two-step process that comprises the mark-compact algorithm. During the first step, garbage collection goes through the entire heap, marking all unreachable objects (the red circles). During the second step, the unreachable objects (red circles) are compacted, leaving only live objects (the gray circles). Figure 3. Major collections Ratio of old to new generationsSo far, my diagrams have casually drawn a line to separate the old and the new generations. The actual placement of the dividing line between the old and new generations is the most critical decision influencing HotSpot JVM performance. Every time you start the HotSpot JVM, you determine where to place this line by including or omitting one parameter. NewRatioYou can divide the heap into old and new generations using the Java HotSpot Client VM ratioThe Java HotSpot Client VM 1.3.1 replaces both the classic JVM and the JVM 1.2 just-in-time (JIT) compilers to improve runtime performance for applications and applets. The HotSpot Client JVM has been specially tuned to reduce application startup time and memory footprint, making it particularly well suited for client environments. On all platforms, the HotSpot Client JVM is the default. The default Figure 4. Impact of NewRatio on generation sizes Java HotSpot Server VM ratioThe Java HotSpot Server VM 1.3.1 is similar to the HotSpot Client JVM except that it has been specially tuned to maximize peak operating speed. It is intended for long-running server applications, for which the fastest possible operating speed is generally more important than having the fastest startup time. To invoke the HotSpot Server JVM instead of the default HotSpot Client JVM, use the The default Client JVM vs. Server JVM: Which is right for you?There is no simple answer to the question of which HotSpot JVM is right for your application. Just because an application is long running doesn't mean it doesn't allocate many short-lived objects. Also, just because an application is a GUI doesn't mean it only allocates short-lived objects. Only you understand how your application creates and destroys objects. The "Capture GC Statistics" section below explains how to determine GC behavior within your application. Poorly selected JVM parameters can severely degrade your application performance. For applications running under the WebLogic framework, I have often seen 20 to 30 percent performance improvement simply by adding the In general, if your Java application is a standalone program, the HotSpot Client JVM will probably give you the best performance. If your Java application executes within a server framework, such as BEA WebLogic, the HotSpot Server JVM will probably give you better performance. Try each and see what works best for your application. Note: When no client or server parameter is provided, the Java HotSpot VM 1.3.1 uses its default value. The default is the first line in the SurvivorRatioThe While Figure 5. SurvivorRatio default setting of 25 Word to the wiseTo improve performance, it's important to keep eden smaller than half the heap size. This ensures that you have enough memory available to complete a minor collection. When you lack enough memory, a major collection will occur, which will bog down performance. This means that the old generation must typically be larger than the new generation. One reason for this is that the HotSpot JVM guarantees that if everything is alive in eden, it can all be copied to the old space, so that every collection triggers a full GC. An exception is if you use the This gives you an idea of the level of control you can have over the way in which the HotSpot JVM uses the heap space you allocate to it. It should also show the level of understanding you should possess before using Analyze GC behaviorTo determine which HotSpot JVM parameters are best for your application, you may need to ask the JVM to display information about its GC behavior. Capture GC statisticsEvery time the JVM performs a collection, the command line parameter [GC 40549K->20909K(64768K), 0.0484179 secs] [GC 41197K->21405K(64768K), 0.0411095 secs] [GC 41693K->22995K(64768K), 0.0846190 secs] [GC 43283K->23672K(64768K), 0.0492838 secs] [Full GC 43960K->1749K(64768K), 0.1452965 secs] [GC 22037K->2810K(64768K), 0.0310949 secs] [GC 23098K->3657K(64768K), 0.0469624 secs] [GC 23945K->4847K(64768K), 0.0580108 secs] An awk script to analyze the GC dataThe following BEGIN { printf("Minor\tMajor\tAlive\tFreed\n") } { if ( substr(body,1,4) == "[GC " ) { # break each input line into 4 pieces in array[] split(body,array," "); # array[1]="[GC" # array[2]="20713K->549K(64768K)," # array[3]="0.0086130" # array[4]="secs]" printf("%s\t0.0\t",array[3]) # break array[2]="43960K->1749K(64768K)," into 4 pieces in barray[] split(array[2],barray,"K") # barray[1]="43960" # barray[2]="->1749" # barray[3]="(64768" # barray[4]=")," before=barray[1] after=substr(barray[2],3) reclaim=before-after printf("%s\t%s\n",after,reclaim) } if ( substr(body,1,9) == "[Full GC " ) { # break each input line into 4 pieces in array[] split(body,array," "); # array[1]="[Full" # array[2]="GC" # array[3]="20713K->549K(64768K)," # array[4]="0.0086130" # array[5]="secs]" printf("0.0\t%s\t",array[4]) # break array[2]="43960K->1749K(64768K)," into 4 pieces in barray[] split(array[3],barray,"K") # barray[1]="43960" # barray[2]="->1749" # barray[3]="(64768" # barray[4]=")," before=barray[1] after=substr(barray[2],3) reclaim=before-after printf("%s\t%s\n",after,reclaim) } # no idea what this line is so skip it next; } Here is sample output from the above Minor Major Alive Freed 0.0484179 0.0 20909 19640 0.0411095 0.0 21405 19792 0.0846190 0.0 22995 18698 0.0492838 0.0 23672 19611 0.0 0.1452965 1749 42211 0.0310949 0.0 2810 19227 0.0469624 0.0 3657 19441 0.0580108 0.0 4847 19098 I imported the Figure 6. Sample graph of minor/major collections Figure 7 illustrates a graph created from the two righthand columns. It shows the amount of memory live objects consumed following each GC, as well as the amount of memory freed when dead objects were reclaimed. Figure 7. Sample graph of GC behavior Tenure distributionAfter objects have been collected several times, they become tenured and are promoted to the old generation. You can use the The JVM attempts to keep a target percentage of the survivor spaces empty, as defined by the (survivor_capacity * TargetSurvivorRatio) / 100 * sizeof(a pointer) Below is sample output when running [GC Desired survivor size 393216 bytes, new threshold 1 (max 32) - age 1: 509624 bytes, 509624 total 20288K->497K(64768K), 0.0147963 secs] [GC Desired survivor size 393216 bytes, new threshold 32 (max 32) - age 1: 169616 bytes, 169616 total 27697K->6997K(64768K), 0.0038858 secs] [GC Desired survivor size 393216 bytes, new threshold 32 (max 32) - age 1: 191392 bytes, 191392 total - age 2: 52944 bytes, 244336 total 27285K->7070K(64768K), 0.0046738 secs] [GC Desired survivor size 393216 bytes, new threshold 1 (max 32) - age 1: 733488 bytes, 733488 total - age 3: 52944 bytes, 786432 total 27358K->7662K(64768K), 0.0148100 secs] In the first GC above, objects occupying 509,624 bytes have already been copied once, meaning that they are in the first age bucket (age 1). In the third GC, there are objects occupying 191,392 bytes in the first age bucket and objects occupying 52,944 bytes in the second age bucket. Since the threshold is 32 during the third GC, you see that the objects were not tenured, because in the fourth GC there are objects occupying 52,944 bytes in the third age bucket. (See "Sidebar 2: Performance Test Program" at the end of this article for information on Ed Ort's Take out the trashOrdinarily, the Java developer doesn't have to be concerned with the complexity of memory allocation and GC within the JVM. However, understanding aspects of this hidden implementation can help you ensure effective resource use. Garbage collection algorithms make assumptions about the way applications use objects. The HotSpot JVM's tunable parameters let you adjust the GC algorithms to better meet your application's behavior characteristics. Sometimes just adding the Stay tuned for the Java HotSpot VM 1.4 release, which will extend the train, or incremental, garbage collector algorithm to perform GC in parallel.
|
|