[Loney05] Chapter 9. Using STATSPACK

Stefen 2010-09-26

展开全文

Chapter 9. Using STATSPACK

You can use the STATSPACK utility to monitor the performance of your database. In this chapter, you will see how to install STATSPACK, how to manage it, and how to run and interpret the reports generated.

As of Oracle10g, you can use the Automatic Workload Repository to gather and analyze statistics. The STATSPACK utility provides an ad hoc capability to analyze database statistics in a similar fashion—by taking snapshots of the database statistics at different times and generating reports based on the differences.

9.1. Installing STATSPACK

STATSPACK must be installed in every database to be monitored. The installation script, named spcreate.sql, is found in the /rdbms/admin subdirectory under the Oracle software home directory. The spcreate.sql script creates a user named PERFSTAT and creates a number of objects under that schema.

NOTE

You should allocate at least 100MB for the initial creation of the PERFSTAT schema's objects.

To start the spcreate.sql script, change your directory to the ORACLE_HOME/rdbms/admin directory and log into SQL*Plus using an account with SYSDBA privileges:

SQL> connect system/manager as SYSDBA
SQL> @spcreate

During the installation process, you will be prompted for a password for the PERFSTAT user and a default tablespace for the PERFSTAT user (a list of available tablespaces will be displayed along with this prompt). You will also be asked to specify a temporary tablespace for the user. Once you have provided the default and temporary tablespaces, the PERFSTAT account will be created, and the installation script will log in as PERFSTAT and continue to create the required objects. If there is not sufficient space to create the PERFSTAT objects in the specified default tablespace, the script will return an error.

NOTE

Although you start the installation script while logged in as a SYSDBA-privileged user, the conclusion of the installation script will leave you logged in as the PERFSTAT user.

If you want to drop the PERFSTAT user at a later date, you can run the spdusr.sql script located in the ORACLE_HOME/rdbms/admin directory.

9.1.1. Security of the PERFSTAT Account

The PERFSTAT account is created with the password you specify during STATSPACK installation. You can change the PERFSTAT account password at any time.

The PERFSTAT account is granted the SELECT_CATALOG_ROLE role and SELECT access on a large number of V$ views, along with several system privileges (CREATE/ALTER SESSION, CREATE TABLE, CREATE/DROP PUBLIC SYNONYM, CREATE SEQUENCE, and CREATE PROCEDURE). Any user who can access your PERFSTAT account can select from all the dictionary views. For example, such a user could query all the database account usernames from DBA_USERS, all the segment owners from DBA_SEGMENTS, and the currently logged-in sessions from V$SESSION. The PERFSTAT account, if left unprotected, provides a security hole that allows intruders to browse through your data dictionary and select targets for further intrusion.

In addition to the privileges it receives during the installation process, the PERFSTAT account will also have any privileges that have been granted to PUBLIC. If you use PUBLIC grants instead of roles for application privileges, you must secure the PERFSTAT account. You can lock database accounts and unlock them as needed; see Chapter 10 for details.

9.1.2. Post-installation

Once the installation process is complete, the PERFSTAT account will own tables, indexes, a sequence, and a package. You will use the package, named STATSPACK, to manage the statistics-collection process and the data in the tables. The collection tables, whose names all begin with "STATS$," will have column definitions based on the V$ view definitions. For example, the columns in STATS$WAITSTAT are the ones found in V$WAITSTAT, with three identification columns added at the top:

desc stats$waitstat

Name                     Null?    Type
------------------------ -------- ------------
SNAP_ID                  NOT NULL NUMBER(6)
DBID                     NOT NULL NUMBER
INSTANCE_NUMBER          NOT NULL NUMBER
CLASS                    NOT NULL VARCHAR2(18)
WAIT_COUNT                        NUMBER
TIME                              NUMBER

The Class, Wait_Count, and Time columns are based on the Class, Count, and Time columns from V$WAITSTAT.STATSPACK has added the following three identification columns:

SNAP_ID	An identification number for the collection. Each collection is called a "snapshot" and is assigned an integer value.
DBID	A numeric identifier for the database.
INSTANCE_NUMBER	A numeric identifier for the instance, for Real Application Clusters (RAC) installations.

Each collection you perform is given a new Snap_ID value that is consistent across the collection tables. When you run a STATSPACK report, you will see a list of all available snapshots.

9.2. Gathering Statistics

Each collection of statistics is called a snapshot. Snapshots are a point-in-time collection of the statistics available via the V$ views, and they are given a Snap_ID value to identify them. You can generate reports on the changes in the statistics between any two snapshots.

NOTE

As with the UTLBSTAT/UTLESTAT reports, the STATSPACK report will only be valid if the database was not shut down and restarted between the snapshots evaluated.

Between snapshots, STATSPACK performs no processing and adds no burden to your database performance. STATSPACK only impacts performance when you take snapshots and when you generate reports on the changes in statistics between two snapshots.

NOTE

Be sure the TIMED_STATISTICS database initialization parameter is set to TRUE prior to gathering statistics.

To generate a snapshot of the statistics, execute the SNAP procedure of the STATSPACK package, as shown in the following listing. You must be logged in as the PERFSTAT user to execute this procedure.

execute STATSPACK.SNAP;

PL/SQL procedure successfully completed.

When the SNAP procedure is executed, Oracle populates your SNAP$ tables with the current statistics. You can then query those tables directly, or you can use the standard STATSPACK report (to see the change in statistics between snapshots).

Snapshots should be taken for the following reasons:

To evaluate performance during specific tests of the system For these tests, you can execute the SNAP procedure manually, as shown in the prior example.
To evaluate performance changes over a long period of time To establish a baseline of the system performance, you may generate statistics snapshots on a scheduled basis. For these snapshots, you should schedule the SNAP procedure execution via Oracle's internal DBMS_JOB scheduler or via an operating system scheduler. You can use the spauto.sql script in the $ORACLE_HOME/rdbms/admin directory to schedule snapshots.

For the snapshots related to specific tests, you may wish to increase the collection level, which lets you gather more statistics. As noted in the "Managing the STATSPACK Data " section, later in this chapter, each snapshot has a cost in terms of space usage and query performance. Avoid generating thousands of rows of statistical data with each snapshot unless you plan to use them.

To support differing collection levels, STATSPACK provides the level parameter. By default, level is set to a value of 5. Prior to changing the level value, you should generate several snapshots and evaluate the reports generated. The default level value is adequate for most reports. Alternative level values are listed in the following table:

Level	Description
0	General performance statistics on all memory areas, latches, pools, and events
5	Same statistics from the lower levels, plus high-resource-usage SQL statements
6	Same statistics from the lower levels, plus SQL plan and SQL plan usage data
7	Same statistics from the lower levels, plus segment level statistics, including logical and physical reads, row lock, and buffer-busy waits
10	Same statistics from the lower levels, plus parent/child latch data

The greater the collection level, the longer the snapshot will take. The default value (5) offers a significant degree of flexibility during the queries for the most resource-intensive SQL statements. The parameters used for the resource-intensive SQL portion of the snapshot are stored in a table named STATS$STATSPACK_PARAMETER. You can query STATS$STATSPACK_PARAMETER to see the settings for the different thresholds during the process of SQL statement gathering. Its columns include Snap_Level (the snapshot level), Executions_Th (threshold value for the number of executions), Disk_Reads_Th (threshold value for the number of disk reads), and Buffer_Gets_Th (threshold value for the number of disk reads).

For a level 5 snapshot using the default thresholds, SQL statements are stored if they meet any of the following criteria:

The SQL statement has been executed at least 100 times.
The number of disk reads performed by the SQL statement exceeds 1000.
The number of parse calls performed by the SQL statement exceeds 1000.
The number of buffer gets performed by the SQL statement exceeds 10,000.
The sharable memory used by the SQL statement exceeds 1MB.
The version count for the SQL statement exceeds 20.

When evaluating the snapshot's data and the performance report, keep in mind that the SQL threshold parameter values are cumulative. A very efficient query, if executed enough times, will exceed 10,000 buffer gets. Compare the number of buffer gets and disk reads to the number of executions to determine the activity each time the query is executed.

To modify the default settings for the thresholds, use the MODIFY_STATSPACK_PARAMETER procedure of the STATSPACK package. Specify the snapshot level via the i_snap_level parameter, along with the parameters to change. Table 9-1 lists the available parameters for the MODIFY_STATSPACK_PARAMETER procedure.

Table 9-1. Modification Parameters
Parameter Name	Range of Values	Default	Description
i_snap_level	0, 5, 6, 7, 10	5	Snapshot level
i_ucomment	Any text	blank	Comment for the snapshot
i_executions_th	Integer >=0	100	Threshold for the cumulative number of executions
i_disk_reads_th	Integer >=0	1000	Threshold for the cumulative number of disk reads
i_parse_calls_th	Integer >=0	1000	Threshold for the cumulative number of parse calls
i_buffer_gets_th	Integer >=0	10000	Threshold for the cumulative number of buffer gets
i_session_id	Valid SID from V$SESSION	0	Session ID of an Oracle session, if you wish to gather session-level statistics
i_modify_parameter	True or False	False	Set to True if you wish to save your changes for future snapshots

To increase the Buffer_Gets threshold for a level 5 snapshot to 100,000, issue the following command:

STATSPACK.MODIFY_STATSPACK_PARAMETER -
  (i_snap_level=>5, i_buffer_gets_th=>100000);

If you plan to run the SNAP procedure more frequently than hourly, you should pin the STATSPACK package in the shared pool following database startup. The following listing shows a trigger that will be executed each time the database is started. The KEEP procedure of the DBMS_SHARED_POOL procedure pins the package in the shared pool.

create or replace trigger PIN_ON_STARTUP
after startup on database
begin
   DBMS_SHARED_POOL.KEEP ('PERFSTAT.STATSPACK', 'P');
end;
/

9.3. Running the Statistics Report

If you have generated more than one snapshot, you can report on the statistics for the period between the two snapshots. The database must not have been shut down between the times the two snapshots were taken. When you execute the report, you will need to know the Snap_ID values for the snapshots. If you run the report interactively, Oracle will provide a list of the available snapshots and the times they were created.

To execute the report, go to the /rdbms/admin directory under the Oracle software home directory. Log into SQL*Plus as the PERFSTAT user and run the spreport.sql file found there, like so:

SQL> @spreport

Oracle will display the database and instance identification information from V$INSTANCE and V$DATABASE and will then call a second SQL file, sprepins.sql. The sprepins.sql file generates the report of the changes in the statistics during the snapshot time interval. The available snapshots will be listed, and you will be prompted to enter a beginning and ending snapshot ID. Unless you specify otherwise, the output will be written to a file named sp_beginning_ending.lst (sp_1_2.lst for a report between the Snap_ID values 1 and 2).

The first portion of the report output provides an overview of the cache areas and their usage. The following listing provides sample output for this section, showing the cache sizes and the load profile:

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
       Buffer Cache:      1,536M      Std Block Size:    8K
   Shared Pool Size:      1,648M      Log Buffer:   10,240K

Load Profile
~~~~~~~~~~~~                 Per Second       Per Transaction
                        ---------------       ---------------
        Redo size:            85,921.59              1,924.45
    Logical reads:            23,431.10                524.80
    Block changes:               520.99                 11.67
   Physical reads:               457.96                 10.26
  Physical writes:                61.19                  1.37
       User calls:             2,000.92                 44.82
           Parses:               430.78                  9.65
      Hard parses:                 0.04                  0.00
            Sorts:                57.11                  1.28
           Logons:                 0.53                  0.01
         Executes:               699.70                 15.67
     Transactions:                44.65

  % Blocks changed per Read: 2.22    Recursive Call %: 31.02
 Rollback per transaction %: 0.06       Rows per Sort: 64.62

The load profile helps to identify the type of activity being performed. In this example, the activity recorded includes both queries and transactions. On average, this database supports 44.65 transactions per second while also supporting logical reads and sort operations. The database is actively performing physical reads and writes. In a read-intensive application, the number of physical writes per second should be far below the number of physical reads per second.

Note that the load profile shows the per-second averages in the database; however, if your STATSPACK reporting interval is too large, there may not be such a thing as an average second. For example, if your interval includes both your data-loading processes and your online user activity, the averages would reflect the combined impact of both types of usage, obscuring the true load profile of each separate activity.

NOTE

When using the load profile statistics, remember that these are presented as "per second" statistics—the rest of the STATSPACK report values are presented as values for the full duration of the snapshot.

The next sections of the report show the instance efficiency percentages (such as the buffer hit ratio and library cache hit ratio) followed by the shared pool statistics. The shared pool statistics show the percentage of the shared pool in use and the percentage of SQL statements that have been executed multiple times (as desired). The following listing shows sample shared pool statistics from the report:

Shared Pool Statistics          Begin    End
                                ------   ------
             Memory Usage %:    100.00   100.00
    % SQL with executions>1:     71.68    70.60
  % Memory for SQL w/exec>1:     64.63    62.85

Based on the data in the preceding listing, at the time of the second snapshot, 100 percent of the shared pool's memory was in use. Of the statements in the shared pool, only 70 percent had been executed more than once, indicating a potential need to improve cursor sharing in the application. Because the shared pool is completely used, you should consider increasing its size.

The next portion of the generated report shows the top five wait events, the full list of wait events, and the background wait events. Identifying major wait events may help to target your tuning efforts.

Let's consider common wait events: db file scattered reads (waits encountered during multiblock reads such as during full table scans) and db file sequential reads (for single-block reads). For this same database, the statistics (truncated for display here) were as follows:

Event                               Waits   Timeouts   Time (s)
---------------------------- ------------ ---------- ----------
db file sequential read         1,410,528          0      3,631
db file scattered read             20,503          0         36

Adding them together, we find that for the interval there were a total of 1,431,031 waits during reads from datafiles—single-block reads of indexes, single-block reads of tables, plus multiblock reads. How many waits were there per second? In this report, the time interval was one hour—3600 seconds—so the number of waits per second is

1,431,031 waits / 3600 seconds = 397.5 waits per second

How does that compare with the number of reads? From the load profile, we already know the number of physical reads per second:

~~~~~~~~~~~~                   Per Second      Per Transaction
                              ------------     ---------------
    Physical reads:                457.96                10.26

So, in an average second, how many waits occur per physical read? To answer this, divide the waits per second by the reads per second:

397.5 waits per second / 457.96 physical reads per second = 0.868

Based on these statistics, roughly 87 percent of all reads during the interval encountered wait events. That is a very high percentage of waits per read, and you should examine the I/O environment to see if there are opportunities to improve its performance. Don't look for full table scans as the culprits—remember the source of the waits:

Event                               Waits   Timeouts   Time (s)
---------------------------- ------------ ---------- ----------
db file sequential read         1,410,528          0      3,631
db file scattered read             20,503          0         36

Single-block reads accounted for 98.5 percent—that is, 1,410,528/(1,410,528+20,503)—of the waits. Look for inefficient indexes that are being scanned repeatedly. If you eliminate all full table scans in the database, you will reduce the number of physical waits by at most 1.5 percent.

The most resource-intensive SQL statements in the database are listed in the next section of the report, in descending order of buffer gets. Because the buffer gets statistic is cumulative, the query with the most buffer gets may not be the worst-performing query in the database; it may just have been executed enough times to earn the highest ranking. Compare the cumulative number of buffer gets to the cumulative number of disk reads for the queries; if the numbers are close, you should evaluate the explain plan for the query.

NOTE

If the shared pool is flushed between the execution times of the two snapshots, the SQL portion of the output report will not necessarily contain the most resource-intensive SQL executed during the period.

The SQL statements are listed three separate times—ordered by buffer gets, then by physical reads, then by executions. It is common to find resource-intensive SQL commands that show up in only one or two of these three listings. For example, if your application constantly executes a query such as

select TRUNC(SYSDATE) from DUAL;

then it will not generate many physical reads. However, the number of executions of this command may cause it to be one of the most expensive queries in the database. Some applications run queries like this millions of times per day—to the point at which one or more CPUs are constantly busy doing nothing more than returning the current system date to the user. Even if the commands are efficient in terms of physical I/O by themselves, consider their number of executions and their buffer gets—and the corresponding CPU burdens they incur.

Following the SQL statement listing, you will see the list of changes to statistics from V$SYSSTAT, titled "Instance Activity Stats." The V$SYSSTAT statistics are useful for identifying performance issues not shown in the prior sections. For example, you should compare the number of sorts performed on disk to the number performed in memory; increase the PGA_ AGGREGATE_TARGET value to reduce disk sorts. If there is a significant number of full table scans of large tables, evaluate the most-used queries. The following listing shows four rows from this section of the report:

Statistic                     Total  per Second    per Trans
--------------------------- ------- ----------- ------------
sorts (disk)                     13         0.0          0.0
sorts (memory)              205,651        57.1          1.3
table scans (long tables)        23         0.0          0.0
table scans (short tables)  602,813       167.4          3.8

In this case, there are full table scans of long tables (those with more than five blocks), but they are a minority of the full table scans performed. You should check to see if the small tables are being properly cached in memory (such as in the KEEP pool). The sorts to disk are small in number, but every sort to disk is wasted effort. You should increase the sort area size and avoid writing to the temporary tablespace unless absolutely necessary.

The next section of the report provides the I/O statistics by tablespace and by datafile. If the I/O is not properly distributed among your files, you many encounter performance bottlenecks during periods of high activity. You can use this section of the report to identify such bottlenecks and to measure how effectively you have resolved those problems. See Chapter 4 for further details on I/O distribution across files.

Following the I/O statistics, the report lists the buffer cache statistics by pool (DEFAULT, KEEP, and RECYCLE), instance recovery statistics (the number of redo blocks), and the buffer pool advisory. The buffer pool advisory shows an estimated physical read factor so you can judge how much impact increasing the data block buffer cache will have on the number of physical reads required. The buffer pool section is followed by an advisory on the PGA aggregate settings.

The next sections of the report show the buffer statistics related to waits during write events:

Tot Wait    Avg
Class                    Waits   Time (s) Time (ms)
------------------ ----------- ---------- ---------
data block                 119          0         1
undo header                 86          0         0
undo block                   6          0         0

In general, using Automated Undo Management will reduce or eliminate waits for undo headers and undo blocks. Data block waits may be reduced by using asynchronous I/O or by improving the efficiency of the I/O environment. In this example, a few milliseconds are being spent waiting for data blocks.

The next section of the report lists the enqueue waits, including TX locks for individual rows, TC locks for thread checkpoints, and others. Enqueue waits can be caused by application architecture decisions and database configuration issues, so evaluating their cause may require additional testing.

After those sections, the report provides undo segment statistics. First, it lists the activity in the undo segments (writes, wraps, shrinks, extends) and the waits encountered. Following that, the report shows the number of undo blocks written per time interval.

Latch activity and dictionary cache statistics are then presented, followed by the library cache activity. If your "Pct Miss" value is high, you may need to improve cursor sharing in your application or increase the size of the shared pool.

A shared pool advisory is displayed in the next section of the report. In the advisory, Oracle estimates the number of library object hits expected in each incremental increase in the size of the shared pool.

Following an SGA memory summary (from V$SGA) and a listing of the memory changes during the snapshot interval, the report lists the database initialization parameters in use at the beginning and end of the report.

Taken as a whole, the report generates a significant amount of data, allowing you to develop a profile of the database and its usage. Based on the initialization, file I/O, and SGA data, you can develop an understanding of the major components in the database configuration. Because this report generates so much data, you should be careful not to generate more statistics than you plan to use. The next section of this chapter addresses the management of the gathered data.

9.4. Managing the STATSPACK Data

You should manage the data generated by STATSPACK to guarantee that the space usage and performance of the STATSPACK application meets your requirements as the application data grows. Managing STATSPACK data includes the following steps:

Regularly analyze the STATSPACK data. At a minimum, you should analyze the STATSPACK table prior to running the spreport.sql report:
```
execute DBMS_UTILITY.ANALYZE_SCHEMA('PERFSTAT','COMPUTE');
```
Purge old data. Because you cannot generate valid interval reports across database shutdown/startup actions, data prior to the last database startup may not be as useful as the most current data. When the data is no longer needed, purge it from the tables. Oracle provides a script, sppurge.sql, to facilitate purges. The sppurge.sql script, located in the /rdbms/admin directory under the Oracle software home directory, lists the currently stored snapshots and prompts you for two input parameters: the beginning and ending snapshot numbers for the purge. The related records in the STATS$ tables will then be deleted. The sppurge script prompts you to back up your old statistics before purging them. You can back up the data by exporting the PERFSTAT schema. You may wish to keep old statistics for baseline measurements.
Truncate the STATSPACK tables when the data is not needed. Old statistical data may no longer be relevant, or you may have imported the old statistics during database migrations or creations. To truncate the old tables, execute the sptrunc.sql SQL*Plus script from within the PERFSTAT account. The script is located in the /rdbms/admin directory under the Oracle software home directory.

9.5. Deinstalling STATSPACK

Because STATSPACK includes public synonyms as well as private objects, you should remove the application via a SYSDBA privileged account. Oracle provides a script, spdrop.sql, to automate the deinstallation process. From within the /rdbms/admin directory under the Oracle software home directory, log into SQL*Plus and execute the script as shown here:

SQL> connect system/manager as SYSDBA
SQL> @spdrop

The spdrop.sql script calls the scripts that drop the tables, package, public synonyms, and the PERFSTAT user. To reinstall STATSPACK, execute the spcreate.sql script as shown earlier in the chapter.