8.3. Databases

Stefen 2011-03-29

展开全文

8.3. Databases

There is one cardinal rule when designing backup solutions for databases: database administrators (DBAs) always want local copies of their database backups, so it is important to understand why. DBAs, as custodians of the databases they support, need to have total control of many aspects of the systems on which they run, backups included. Part of that responsibility is to ensure that the databases are restored as quickly as possible or as frequently is requested; the data needs to be restored "instantly". When DBAs talk about restoring data, they are speaking about having the data back in the database structure and available to end users, including all the steps required to re-integrate backups provided by the backup service back into the database in question. Frequently, DBAs only need to restore small portions of the database, not entire sections. They are restoring data as a result of mistaken entries to the data, not typically corruption of the database as a whole.

DBAs, especially ones who have more career experience, have had poor experiences with external backup software, either in terms of ability to restore small pieces of the database, performance impacts on the databases during backup, and availability of backups from which to restore data. Given all these factors, many DBAs prefer to have native database tools perform local backups of the database(s) onto local disk and then have the backup tools backup the resulting file set. From the DBA perspective, if there are problems with the backup or resulting restore, the DBA only needs to engage the database vendor not a group of people consisting of the backup administrator, backup vendor, and database vendor to solve any problems.

While this is one strategy, it is important to understand that while it may seem efficient from the DBA perspective and solves the tactical issues at some level, as the backup architect, your view must be more strategic. Modern backup software does not impact the database any more than native tools, and in many cases simply uses the same API as the native tools for backup. Additionally, while backup software does not provide record-level recoverability of databases, both NetBackup and CommVault provide the ability to perform incremental and differential copies of databases, providing the ability to not have to restore entire databases.

Local backups to disk also consume large amounts of resources. The disk space alone will require at least as much space as is required for the database as a whole, and may actually require more space, depending on the rotation schedule that would be required to maintain active copies available during backup operations. Native tool backups also create compressed copies of the data in order to consume as few resources as possible. The compression techniques used are no different from those used by standard folder compression, and come with all the CPU, memory, and I/O overhead. So, by running the native tools, it is possible to actually introduce more load on the server than using the backup software, when running in a compressed mode. In addition, the compression hinders the ability of the backup software to make copies of the native tools backup. Both NetBackup and CommVault use the hardware compression on media tape drives. However, having compressed source data defeats the hardware compression on the media and actually slows down the media transfer rates and increases the media that is consumed. If the target media is a deduplicated target, then the ability to perform deduplication on compressed data is effectively turned off.

8.3.1. Database Log Backups

Local backups do have alternatives. Both Oracle and SQL Server maintain records of the transactions that have been committed to the database over a specific period of time. These records are called transaction logs. Oracle has two types of these logs: redo logs and archive logs, and SQL Server has transaction logs. These logs can be replayed against a database to bring the database back to the last transaction within the database. As will be shown later in the chapter, it is strongly recommended, and in some cases required, that these logs be backed up as part of a regular backup scheme.

If the database logs are backed up more frequently than the database, the logs can be used to restore the database to a point in time between normal backups. This effectively provides the equivalent of an incremental database backup, occurring between regular backup cycles. So instead of committing the disk space to a local backup, a database could be backed up daily by the backup software and then provide periodic backups of the transaction logs, which can then be replayed back. This provides a roughly equivalent functionality to the local backup plus record restore functionality enabled by a local backup methodology. Discuss the option with the DBA as an alternative to local backup.

8.3.2. Locally Controlled Database Backups

If DBAs do not want or need local backup, many times they do desire a locally controlled backup. This type of backup is essentially a user-initiated backup and provides control of exactly when the backup occurs. While both NetBackup and CommVault provide scheduled backups, they are not deterministic schedulers, meaning that they only provide a window in which the backup may be executed, not an exact time of execution. This may not work for certain environments in which data processing occurs on the database during specific hours and when backups cannot be run. Locally controlled backups may go to disk, followed by a backup software pass, or may directly execute the backup software. A locally controlled backup typically requires some level of scripting to integrate the native backup tools and the backup software into a single process. Since the process is controlled locally, the backup can inject a load into the backup system that may not be accounted for in the normal scheduling process. This load may actually delay the database backup further—the backup job is waiting for resources that were not "scheduled" for use or may impact the performance of other backup jobs by competing for resources that were originally allocated for use by scheduled backups.

8.3.3. Pre- and Post-Scripts

The alternatives to locally scheduled database backups do not completely solve the problem. Both CommVault and NetBackup provide a means to execute pre- and post-backup scripts. Pre- and post-backup scripts are optional components that can perform tasks prior (pre-) or after (post-) the execution of the backup. This ability of the backup software to start a script before the backup actually runs gives DBAs the ability to control the exact time that a backup will run through the use of a trigger. A trigger script is one that is executed and waits for a specific event before stopping, known as a trigger. DBAs will use their own scripts to place the database in the appropriate state and issue the trigger. The pre-backup script will then stop and the backup will begin. The post-backup script then puts the database back into normal operation at the conclusion of the backup.

In general, a trigger could be designed to function in response to a number of items: time of day, completion of processing, and so on. The trigger script would be included in a pre- and post-backup process that would be executed well before the window in which the application is required to be backed up. While this solves the problem of not a deterministic scheduler executing a backup at a specific time, these types of script commit backup resources and idle them until the trigger is pulled. If the script fails then the resources can be committed until the error is detected, many times well after the backup was supposed to have been completed. Additionally, these scripts have to be maintained—if there are changes to the backup software, database software, or both, the scripts have to be revalidated to ensure that the process and functionality remain after the software change—a process that can be time-consuming and fault injecting. When looking at local control of database backups, balance the ability to forecast resources for the backup against the need to have scheduling around critical business processes. If the business process requires non-backup time to complete the tasks, local control may be the answer. Simply plan around the resource constraint as best as possible and monitor the backups of both the database and other jobs around that time to ensure that there are not issues introduced by using this method.

8.3.4. Snapshot Backups

But what if you could have the best of both worlds—have the ability to provide a local copy of the database, but be able to control the backup environment? Split-mirror and snapshot backups provide this type of functionality. A split-mirror backup is one where a mirror copy of the disks are created, most often at the disk array level, and made available for use independently of the original source disks. The split mirror is a full clone of every block, on every physical disk, that is used for the database and participated in the mirror split.

8.3.4.1. Snapshots

A snapshot backup is similar. Snapshots can be created on the operating system, using products such as VSS or FlashSnap, or on the disk array with TimeFinder or NetApp Snapshot. Instead of creating a full clone copy of every block, the snapshot starts at a point in time, tracks the blocks that have changed by making a copy of the changed blocks onto a separate set of disks, and then references the unchanged plus the changed blocks to make a usable, presentable copy of the disks. The snapshot process, in most instances, uses a method called copy on write (CoW). A CoW snapshot, unless carefully managed and used for only a short period of time, can have a significant impact on performance. CoW intercepts all writes to a set of disks and makes copies of blocks that have changed since the start of the snapshot session. These blocks are then copied to a separate set of disks, while the original blocks are committed to their intended location on disk. The write is not acknowledged back to the host until both copies of the block have been fully written—introducing a second write, thus introducing time to write. When the data is read from the snapshot, the data from the blocks that have changed during the duration of the snapshot session are read from this separate location in which they were written, but the unchanged blocks are read from the original location. If another process is attempting to read the same block at the same time as the process reading a snapshot, a deadlock occurs, and one process must wait, introducing latency to the other. While this may seem like an unlikely occurrence, in a relatively small number of blocks, with only a moderately active database, such deadlocks can occur frequently enough to slow the overall performance of the database.

8.3.4.2. Split Mirror

The split mirror/snapshot backup works by using a combination of operating system and/or array functionality that creates this point-in-time copy of all the files on the disks occupied by the database. This point-in-time copy can then be backed up independently of the database, at a time of the backup software's choosing, and without impacting the database as a whole. This copy could even be backed up by an alternative system that would completely offload the whole backup process from the database server—effectively making the backup look to the database like it was almost instantaneous, depending on the requirements of the database. The process looks something like this:

The database is quiesced and/or placed into a backup mode. This mode may allow the continuation of processing during the backup.
The snapshot is created from the quiesced database. This snapshot represents a full point-in-time copy of the database at the point at which the database was placed into the backup mode. The snapshot must include all portions of the database that allow it to run, such as transaction logs, trace log, and the like.
The database is then taken out of backup mode. This effectively ends the backup cycle for the database.
The snapshot is then available for backup, offline processing (as another database), or restore purposes (record-level restores—just like a local copy). All this functionality can be local to the source server, or operated on a secondary server, set aside for such purposes (such as a Media Server or MediaAgent).

However, as with all things backup, there are trade-offs. In using this type of backup method, there still can be a significant amount of disk required. The storage requirement can range from a full copy of the disk space, in the case of a split mirror, to the amount of change that occurs between backups—the requirement for a snapshot backup. The process described previously is not automatic; in order to function in a scheduled backup environment, it must be automated. This automation can be in the form of a set of scripts that implement the functionality or in additional agents that need to be installed to extend the backup software capabilities to include integration with the split mirror/snapshot provider. Additionally, if it is desirable to back up the mirror/snapshot on a server separate from the source, shared storage is required, and additional servers of the same OS type and level need to be in place to host the mirror/snapshot. This secondary server is known as a proxy server. Finally, if any database operations are desired to be performed on the proxy server, such as record-level restores, a copy of the database software must also be installed and maintained on the proxy server. Depending on the database vendor, this proxy server may need to be isolated from the production network as the database may try to advertise itself as the primary database. This is most often the case when working with SQL Server—SQL registers itself with Active Directory at start time.

The previous discussion talked about the need to put the database into a "backup mode" prior to making copies of the database. Prior to backing up, all databases need to make sure that pending transactions are committed to the database which creates what is known as a consistent database. This is necessary to ensure that all data that is in the database is correctly stored and referenced by the internal database structures. If this process is not performed, the resulting database copy can be unusable. There are generally three methods of preparing a database for backup: shutting down the database completely ("cold"); placing the database into a log tracking-only state where new transactions are placed only into the transaction log and not committed to the database until the backup mode is exited ("hot"); or through the use of an data arbitrator or API specifically designed around database backup ("programmatic"). Mirror/snapshot backups operate using either the cold or hot database backup methods; native backup tools use the programmatic method (sometimes in combination with either the hot or cold), and the backup software can use a combination of all three.

So what are the differences? A cold backup requires that access to the database be suspended and that all transactions that reside within the transaction logs be fully committed to the database. This process usually requires the complete shutdown of the database software to complete. Obviously, this represents downtime for the database during which time end users cannot perform any work. However, a cold backup is useful as a baseline copy of the database from which to work, and is the easiest to back up: simply make a copy of the data files that are part of the database. Once the copy is made (which can be made via a mirror/snap, as well as a simple direct backup of the data), the database can be restarted and brought back online. This method is great for databases that do not have any off-hours activity or only require periodic backups as they represent archive copies of structured data.

A hot backup, on the other hand, allows for access to the database during backup. During a hot backup, the database software is commanded to place the database into a specialized mode, commit any pending transactions, and capture all further transactions exclusively in the transaction logs until the database is removed from hot backup mode. This mode provides the ability for end users to continue to query, insert, and update records from the database, while maintaining consistency of the database files for backup. Table 8-1 shows the differences between cold and hot backups.

Table 8-2. Comparison of Cold and Hot Database Backups
	Advantages	Disadvantages
Cold backup	Provides complete, guaranteed consistent copy of database. Easiest to execute. Does not require any special configuration of database to run.	Database is offline during backup. Largest backup; backup is a full backup every time. Only full database restores are possible without proxy servers.
Hot backup	Database continues to function during backup.	Performance of database can degrade during backup. Records are not committed during backup, so database is not truly updated. Transaction logs are of fixed size and can quickly fill during backups, causing database outages.

A hot backup sounds like the perfect solution; however, it comes at a price. Since all new transactions are placed into the transaction logs, and not committed to the database during hot backup mode, the database is in a state where there two records of the same data at the same time can exist—one in the transaction logs and one in the database. This is reconciled after the backup mode is released, as the transaction logs are replayed against the database, making the records consistent again.

Additionally, the transaction logs are of fixed size, and only occupy a small amount of fast disk space. (See Figure 8-4 .) If the backup mode is maintained for too long, the transaction logs can run out of space and the database can crash, requiring DBAs to perform database recovery and possible restoration. Therefore, it is important to keep a database in backup mode for only short periods of time—just long enough to get a backup copy. Enter mirror/snap backups. By combining the hot backup mode with a mirror/snap, backups of very large databases can be accomplished, at least from the database perspective, in a matter of minutes by using the hot backup mode in the mirror/snap process.

Figure 8.4. Size comparison between data and log files

The programmatic method of creating backups is used by software vendors and scripting authors to create customized methods of backup. The database vendor developers use these programmatic methods to selectively ensure that parts of the database are consistent during backup and that the individual records, or in some cases database blocks of data, are retrieved and backed up in a manner that will allow for selective or full restores. These methods can be in the form of software APIs that allow for standard methods of database access at a low-level or vendor-provided tools that provide access and backup outside of the traditional cold and hot backup methods. Third-party software providers such as SQL Lightspeed use these methods to provide optimized backup methods outside of both the "official" SQL Server backup tools as well as NetBackup and CommVault—as will be discussed later in the chapter.

Of course, the reason why backup software providers have database agents is to provide an automated, controlled, and tracked method of backing up database data. Both NetBackup and CommVault have several database agents that cover a number of different databases, such as Oracle, SQL Server, and DB2; and even some smaller database types such as MySQL and Postgres. The two most commonly found databases will be review here in detail: Oracle and SQL Server.

8.3.5. SQL Server

SQL Server has the advantage of only running on Windows platforms and is able to take advantage of several features of the OS that make backup easier. SQL Server has the ability to perform both hot and cold database backups using native tools, and has several ways that allow for backup application integration.

8.3.5.1. SQL Server Terminology

Before we get into the specifics of how to backup SQL Server, let's talk about terminology. Like all other software, database software does not necessarily use a common lexicon to describe the different pieces that make up a database. SQL Server is no different. SQL Server is installed on a physical server and creates an instance of SQL Server. The instance is the highest level of granularity on a server. Within each instance, there may be one or more databases contained and managed within the instance. The database will consist of multiple filegroups, which are exactly what they sound like—groups of files that contain the data structures that are created within the database to store the data: tables, triggers, indices, and so on. (See Figure 8-5.)

Figure 8.5. SQL Server hierarchy

The simplest method that SQL Server uses for backup is the native dump. This provides a local, point-in-time copy of the database, and can either be a simple copy of the required files within the filegroups of the database, or can be a data-level backup, essentially an export of the data contained in the database into a SQL Server specific backup format. When backing up databases within a SQL Server instance, the method of performing backups is guided by a "recovery model" that is specified during the database creation. This model can be changed after the database is created, but the model guides both the method of backup and the ability to recover data. The recovery model can be one of the following:

SIMPLE—Transaction logs are automatically managed (truncated) during normal database operations. Since the transaction logs do not necessarily contain information since the last backup, no backups of the transaction logs are taken. This means that the database can only be recovered to a specific point in time—the time of the last backup. This model can be used for either file-/filegroup-level backups or database-level backups.
FULL—Transaction logs are only truncated after a completed backup, and as such maintain all transactions that occur in the database between backups. Because of this, transaction log backups are taken along with the database. Can be used for either file-/filegroup-level backups or database-level backups. Allows for point-in-time recoveries of data at any point between backups by replaying transaction logs. This is typically the most common and recommended recovery model for SQL Server.
BULK_LOGGED—Only minimal transaction logs are created during normal operations. This model is designed specifically for databases that have data loaded into them all at once, not over time. Such databases are called bulk- or batch-loaded databases. This recovery model also requires that transaction logs are backed up.

SQL Server also allows for the concept of full or differential backups, both at the database and the file-/filegroup-level. The "full" backup stated here should not be confused with the FULL recovery described previously. When full is discussed here, we are referring to the level at which the backup is taken taking all the data and backing it up, not the method of maintaining transaction logs relative to the backup (the recovery model). A full backup at the database level contains all data that resides within the database. In a differential database backup, only the data that has changed since the last backup is put into the backup container. A full backup is required to have been taken prior to a differential to act as a baseline. While this may seem like a trivial activity, it is important to understand that SQL Server does not automatically assume that a full backup has not been taken, and actually has no way to determine differential data without a full being previously taken.

SQL Server also allows for full and differential backups of files or filegroups within a database. This can be useful as it creates a simple method of protecting databases simply by copying files. However, do not be fooled—restoring the database can be complex because multiple files from multiple backups may be required to bring the database back to a consistent state. File-level backups are also larger than their database counterparts as the file will contain an unallocated database area in addition to the live data. This unallocated space within the database files, depending on your DBA's level of paranoia, can be more than 50 percent of the total size of the data files—effectively doubling the amount of data required to be backed up.

SQL Server also can take advantage of a Windows-specific operating system feature for backups: Volume Shadow Service (VSS). VSS provides a native operating system-level ability to create snapshots of file systems or volumes on a particular server, and make these snapshots available for use by backup or other operations. The VSS snapshot is effectively a full copy of the database at the point in time of the backup, but only contains the blocks that have changed since the initiation of the VSS session. This method is good for full backups under the SIMPLE or FULL recovery models.

SQL native tools are good for providing local backup to disk, but both NetBackup and CommVault provide management of backups over time and can utilize different methods of backing up the data within the database to make recoveries more straightforward. NetBackup and CommVault both use another feature of SQL Server, called the Virtual Device Interface (VDI), to accomplish database-level backups. VDI is a command data interface to the databases that allow NetBackup and CommVault to request specific streams of data to be generated and send through a "device" which is read by the backup software and re-directed back to the backup media. While this backup method does not allow the ability to track specific row/records, it does allow the ability to playback data to a particular point in time. CommVault and NetBackup also allow for file-based backups using the VDI interface—the interface places the database into a quiesced state which allows for the backup of all files within the database, or just specific files that have changed since the last backup.

NetBackup and CommVault also have the ability to use VSS as a backup method as well. While this method is not as granular as the VDI method, it can be a faster method of backup from the perspective of the database. During VDI-initiated backups from both NetBackup and CommVault, load is placed on the database as both pieces of backup software are essentially executing database-level operations while regular operations are occurring, thus competing for resources along with regular queries. VSS backups, however, work at the operating-system level to create a snapshot copy of the database. The only time the database sees any interaction is when the VSS service quiesces the database long enough to create the snapshot—an operation that is measured in minutes or seconds, not hours. From there, the backup software can take a backup of the database directly from the VSS snapshot without any further impact, other than OS-level impact, on database operations. While VSS snapshots do not have the ability to playback data as does VDI-based backups, they can be relatively small and taken several times a day, thus providing an ability that is analogous to log replays. Simply apply the VSS snapshot to the database, and the database is restored to the point in time at which the snapshot was originally taken.

In addition to the SQL Server native and NetBackup/CommVault-based backups, there also exists backup software that interacts directly with SQL Server and creates backups that allow row/record-level recoveries from backup stored locally on disk. The most widely used software for this type of backup is SQL Lightspeed by Quest Software. Lightspeed provides SQL Server DBAs the ability to do what they want—have locally controlled "backups" that quickly restore rows/records to a specific database without requiring external assistance, or obviate the need to have a second copy of the database software running to which to restore a native backup. Lightspeed is a powerful tool, but when used for creating backups that will be protected by either NetBackup or CommVault;, there are some issues to be aware of.

8.3.5.2. SQL Lightspeed

Since Lightspeed does not interact directly with either backup software, NetBackup and CommVault will essentially treat the backup of a Lightspeed backup as a full backup, regardless of what Lightspeed has generated. This is because to NetBackup or CommVault, the new backup generated by Lightspeed is simply a collection of files—there is nothing inherently different from a backup perspective of the Lightspeed backups from any other type of file. This means that the amount of data actually backed up by the backup software can be much greater than is really required from the database.

Secondly, there is an inherent assumption that the Lightspeed backup actually backed up data that was different from the previous backup. Said another way, there is no way for either NetBackup or CommVault to track what data was actually protected within a Lightspeed backup. This has two implications:

Backup levels-Every backup of a Lightspeed database dump is by definition a full backup. To both NetBackup and CommVault all files in a Lightspeed backup are new, so all files need to be protected.
Restore levels-There is no real way for either piece of backup software to determine which backup contains the necessary Lightspeed backup when restoring to a point in time, without intervention/documentation from the DBA(s) performing the Lightspeed backup.

This exposes the data protection method as a whole to the ability of the DBAs to manually maintain a record, either within Lightspeed or externally, of which backup corresponds to which point of recovery.

Finally, LightSpeed by default creates compressed copies of SQL Server backups. While this is very efficient for storing backups on local disk, it can play havoc with backups to media via either NetBackup or CommVault. Both pieces of backup software, when using tape media, use the compression that is native to the tape drive. When compressed data is backed up to a tape drive that is using compression, the drive does two things: it slows down as it is attempting to recompress already compressed data and it stores two to three times less data since no compression is possible on the data set. Under normal file system backups, this is not significant because in most cases the amount of compressed data as a percentage of the total data type is not great. However, with a LightSpeed backup set, 100 percent of the data is compressed, making for media storage that is greatly expanded from that which would be normally required. This issue is even more pronounced when using deduplication—compression can cause 10-20 times more storage requirements and performance degradation over uncompressed data.

When deciding which backup method(s) to use, discuss with the DBAs for the database their assessment of how often the data would need to be recovered, and to what level of granularity. Generally, a combination of approaches will cover the requirements of the backup administrator to provide detailed, long-term storage of data for protection; and those of the DBA to provide granular, short-term operational backups of data available for immediate restore. If the DBAs are using LightSpeed or a similar product, work with them to understand the reasons for use. If the DBAs are simply using LightSpeed as a pure backup (they do not require the ability to quickly do record-level restores), work with them to implement a backup based on the backup software modules.

If a local backup is required due to operational or business concerns, then a separate backup, either using VDI or VSS, should also be made daily, independent of the LightSpeed backup. Why not just backup the LightSpeed output? There are several reasons. First, as was discussed previously, since the backup software does not interact with the local backup, there is no way to determine the point in time at which the backup was taken. While this may not seem like an issue at the point of backup, it can become critical at the point of recovery, particularly when recovering offsite.

Take the following scenario: backups are taken of the local backup onto the backup software. An offsite recovery is required to allow for auditing of the database at a particular period of time in the past, which lands between backup cycles. There are two LightSpeed backups that have been tracked by the backup software, which bracket the point in time required for the audit. Which one is required to restore? The only way to tell is to restore both, restore the databases from the resulting LightSpeed backup, and make the determination at that point. If a NetBackup or CommVault backup of the database had been taken in addition to the LightSpeed backup, the determination of the point of backup and the restoration of the database to satisfy the audit is a single restore process, with the ability to determine exactly which backup version is require.

If the backup software modules from NetBackup or CommVault are acceptable to the DBAs, determine their true needs. If the needs require a granular level of restore throughout the day, perform VSS backups of the transaction logs with daily VDI-level backups of the database, with the FULL recovery model having been implemented for the database. This will enable the DBAs to recover the database to a particular point in time, while having the continual record of transactions available for recovery on backup media. This will require moving data to the backup media throughout the day, however, since only the transaction logs will be moving, the amount of data actually backed up will only represent the amount of change since the last log backup. The daily VDI backup will roll up all the changes during the business day into a single backup. The VDI backup would be implemented on a standard full/differential schedule, as determined by the retention requirements of the business.

If a lower level of granularity can be utilized, either a VDI- or VSS-based backup can be taken of the database. Either the FULL or SIMPLE recovery model can be utilized, depending on how much granularity is required during restores. If the FULL recovery model is used, a VDI backup should be used. While this places more load on the database during backup, it allows for a smaller amount of data to be moved during a differential backup because only the changes within the database, not the whole database file, is backed up. However, if it is acceptable to only take a daily snapshot of the database, or if the SIMPLE recovery model has been implemented, using a VSS-based backup of the database at the file-/filegroup-level should be considered. This will still allow for a full/differential backup scheme, but since the SIMPLE recovery model does not allow for the capture of transaction logs, there is no reason to gather database blocks directly—the file or filegroup has already captured all the data needed for a restore.

Native tool backups should be avoided unless there is a very specific operational need. The SQL native tools provide no benefit over the use of the backup software tools, and the resulting backups can consume large amounts of disk space and system resources. The recovery method of native tool backups, while appearing to be simpler than the backup software-based backup, can actually require more steps than using the backup software client into SQL Server. In addition, native tool backups also suffer from the same nondeterministic identity problem described previously for LightSpeed backups. If there is an absolute requirement for local backups, a backup software-based backup should also be taken to mitigate the issue.

8.3.6. Oracle

Where SQL Server operates on a single platform, Windows, Oracle operates on Windows and multiple flavors of *NIX operating systems. While this may seem to present an issue in crafting a backup methodology for Oracle, there is a good set of common tools, both from the native Oracle backup and the NetBackup/CommVault sides to make backups across platforms look very similar.

8.3.6.1. Oracle Terminology

Oracle, like SQL Server, has its own set of terms to describe the various pieces of the implementation. Oracle databases are implemented on physical servers, with instances of Oracle representing the global memory structures and in memory elements required to run a database. The data within an Oracle database is placed in containers on disk called a tablespace. A tablespace can consist of one or more data files that hold the data content. Oracle also maintains transaction logs, but there are two types that can be generated, depending on the mode in which the database is currently running. In standard, or non-archivelog mode, the Oracle database generates only REDO logs. These logs are used to replay transactions that have already occurred—hence the name. In function they are very similar to the transaction logs in SQL Server. Oracle provides a second mode in which the database can operate: ARCHIVELOG mode. In ARCHIVELOG mode, Oracle generates two types of transaction logs: the REDO log and ARCHIVE logs. While the function of the REDO logs remains the same, the ARCHIVE log is used to capture transactions before they are committed to the database. This allows for not only roll back of the database to a point in time but also a roll forward to a point in time after the application of REDO logs. This is a very powerful tool and is required for hot backups, as you will see.

There are several other types of files that are extremely important to Oracle backups: control files and configuration files. Oracle control files are binary files that essentially store information for the database that tells the various software binaries critical information about how to start and maintain state within the database and stores the current state of the database in a static area. There are usually two copies of the control file created: a primary and a backup copy. If the primary copy of the control file is corrupted or missing, the backup can be substituted. Without a control file, the database will not be able to be started, or mounted in Oracle terminology.

NOTE

It is exceedingly important to backup the control files along with the backup of any Oracle database.

Oracle also maintains configuration files that provide information to the binaries regarding options that can be utilized during binary startup and maintains information regarding licensing of the options to the database being maintained. The primary configuration file within a database is the init.ora file. This file contains information regarding the database name, the location of the primary and backup control files, and other configuration information that is critical to identifying the database to the instance. Additionally a secondary file, sqlnet.ora, may be present. It defines connection parameters for the database. The sqlnet.ora file contains information regarding the network interface on which to listen for connections, the physical server name and DNS domain, the external name of the databases, and the security method to be used on the databases specified. While not as important as the control files, the init.ora and the sqlnet.ora files should always be backed up as part of any backup created.

Oracle has two basic methods of preparing the database for backup and several methods for local backup. When using a manual or external method of backing up the Oracle database, using a backup of the data files as the way to back up the database, the database must be quiesced, just as in the SQL Server process. Oracle can be quiesced by shutting down the database, but this imposes the same issues on Oracle as it does on SQL Server—namely that the database is unavailable for use by end users.

8.3.6.2. Oracle Hot Backup

Oracle can also be placed into hot backup mode, or online backup mode, depending on the documentation set being referenced. In this mode, the ARCHIVE logs act as the primary target for transactions while the database is being backed up. In Oracle version 10g or greater, each individual tablespace, which consists of one or more data files, can discretely be placed in hot backup mode and backed up independently. Prior to 10g the entire database had to be placed into hot backup mode and backed up as a unit. Why is placing the entire database into hot backup mode an issue?

During the period of time that the database is in hot backup mode, the ARCHIVE logs are receiving the transactions and the REDO logs are receiving the complete applied transaction, not just the pointers to the changes as normally happens. Since the ARCHIVE and REDO logs are simply circular logs that are located on disk, they do not have the performance that the database as a whole has. So the overall performance of the database is greatly reduced during the period of time that the database is in hot backup mode. Oracle partially solved this problem by allowing individual tablespaces to be placed in hot backup mode. This means that only a select group of files, representing the tablespace being backed up, is impacted, thus allowing an overall better performance during backup of the database, although possibly at the expense of the speed of the backup.

The traditional Oracle hot and cold backups also provided a mechanism to perform quick snapshot backups. The ability to create a snapshot, after placing Oracle into either mode, greatly sped the backup process from the Oracle perspective, and also provided an instantaneous copy of the Oracle database that could be mounted, restarted, and used for other purposes. The snapshot could be VSS-based, in the case of Windows implementations of Oracle, or could simply be an array-based snapshot or clone—the functionality of which would apply across operating system types.

NOTE

Using array-based snapshots or clones for Oracle database backups require that scripting to integrate both the Oracle and the backup application so that the snapshots can coordinate with the state of the database.

This also somewhat fixed the issue with the impact of placing all or part of the database into hot backup mode, by greatly shortening the length of time required for the database to be in hot backup mode to the time required to make the snapshot copy—typically measured in minutes, not hours. This method also had the limitation of requiring complex scripting to coordinate the change of database mode with the creation of the snapshot and then reversing the change of mode back to standard operation.

8.3.6.3. RMAN Backups

The hot and cold backup modes for Oracle have the limitation of only allowing data file-level backups of the database, especially using snapshots for the database copies, which effectively looks like a full backup to the backup software. While the traditional methods are still in use, Oracle introduced a native method that allows for a robust means to perform full and differential backups of an Oracle database: Recovery Manager (RMAN). RMAN is essentially an internal mechanism to track Oracle backups of the data within a particular database. RMAN has two methods of tracking this information: locally maintaining backup metadata within the database control file or globally within a Recovery Catalog—essentially a separate Oracle database specifically setup for RMAN backup metadata. RMAN can perform online and offline (cold) backups of the database being monitored, and is the primary method of integration for both NetBackup and CommVault Oracle backup agents.

NOTE

RMAN is required for agent-based backup, such as those provided by NetBackup and CommVault. The Recovery Catalog is required for RMAN and must be setup by the DBA prior to executing backups using the backup agent. For more detailed information regarding implementing RMAN, consult RMAN Recipes for Oracle Database 11g, by Sam R. Alapati, Darl Kuhn, and Arup Nanda (Apress, 2007).

RMAN provides useful features to the backup software that are not found with the traditional hot/cold file-level backup. Since RMAN maintains state within the control file, differential backups of Oracle data, not data files, can be created—significantly reducing the amount of data required to be backed up and thus the time required for completion. This is similar in function to the VDI-based backups that can be used for SQL Server—a mechanism to track the individual database blocks and only backup those blocks that have changed. If the global Recovery Catalog is used, this metainformation can be maintained for longer periods of time and can cover multiple databases, across multiple servers—providing the ability to manage an entire enterprise and maintain multiple levels of RMAN recovery points, all from a single location.

RMAN can be operated from several points. First is the Database Control graphical user interface (GUI). This provides an interface that is very similar in functionality to what standard backup software would look like, except that it is specific to Oracle. Secondly, RMAN provides a command-line interface (CLI) to allow for command-line execution of backups, along with all configuration features. Lastly, and most important for this discussion, is RMAN's ability to run as a scripted utility. RMAN scripts form the basis for both local Oracle backups, as well as for backups controlled by both NetBackup and Oracle. The RMAN script contains all the configuration and command information necessary to execute the backup. By scripting RMAN, it is possible to consistently backup a database multiple times—which again provides a repeatable mechanism that can be used for backup software. It is by using these scripts that both NetBackup and CommVault can perform automated Oracle backups.

One of the more interesting features provided by RMAN is the ability to natively multiplex Oracle database backups through the use of RMAN channels. An RMAN channel is a data pipe through which the backup data flows to be written to the defined backup media.

NOTE

Just like the relationship between the maximum amount of streams allowed from either a NetBackup or CommVault file system client, the RMAN channel specification must be tuned to the capabilities of the database from which the backup is taken.

For instance, if you have a database with 10 data files, you may want to open 10 channels, one for each data file. However, there may be limitations for performance—the system may have I/O limitations, or the number of media targets available for backup is fewer than the number of channels that are created. In order to throttle the number of channels backing up at the same time, RMAN provides the PARALLELISM parameter within the RMAN script to artificially limit the number of channels that can be operated in parallel.

RMAN Interfaces to NetBackup and CommVault

RMAN provides the interface for both NetBackup and CommVault through the installation of special binaries, called libraries, into the Oracle software. These libraries provide the software "pipe" through which the output of RMAN backups flows to the backup software for tracking and storage onto the backup media. In default configurations, Oracle DBAs create RMAN scripts and run them by using either the Oracle default library to create a local, disk-based backup or the provided software libraries to seamlessly provide backups to the backup media. Here is where the differences between NetBackup and CommVault implementation are seen. NetBackup requires the manual creation of RMAN scripts by the DBAs, which are then called by the Oracle agent to be executed at the time of backup. CommVault, on the other hand, generates RMAN scripts based on the parameters specified during subclient configuration. The RMAN script is generated at backup time and executed using the native RMAN tools.

There are advantages and disadvantages to each approach. The NetBackup approach allows for a great deal of customization and tuning, using the large number of configurable parameters within RMAN to optimize the backup stream or to provide a method of quickly altering how the RMAN backups are completed, simply by changing the RMAN script being used. The CommVault strategy takes the opposite approach. When a subclient is created within CommVault, the backup software already knows what common parameters are changed within the RMAN script and allows for an easy method of adjusting the parameters without physically logging into the client and manually adjusting the script. CommVault also allows for a script "preview" where the backup administrator or DBA can take the generated copy of the RMAN script and make changes.

This feature of CommVault brings up some interesting side notes:

If the RMAN script is essentially running the backup, with only minimal involvement from the backup software, what would prevent the script from being run at will from the database server?
If RMAN is so powerful, why use backup software at all to capture the backup output?

To the first point, there is absolutely nothing that inherently prevents the RMAN script used by either NetBackup or CommVault from executing manually, except any exclusion windows for backup that have been created on the Master Server/CommServe that restrict times in which backups can run. In fact, this feature of RMAN is what is used to run the RMAN scripts for CommVault that have been manually altered in the preview process. The power of RMAN is to provide a very flexible method of performing backups of Oracle databases, run either from the backup software or the client as a user-initiated backup.

But why run backup software at all? RMAN has a scheduling utility, a method of tracking the backup status of the various databases under control and is able to restore quite well. The answer goes back to a subject that was discussed in the SQL Server discussion above: the ability to track backups from an enterprise level, not just from a tactical level. While RMAN does have all these abilities, the backup administrator must be concerned with protection of data in the entirety of the enterprise, not just within the specific set of clients that represent the Oracle databases. The backup software provides a means by which backups can be tracked and recovered external to that provided strictly within the purview of the DBA. Using backup software to perform the Oracle backups also provides the ability to restore servers that may be related to the particular Oracle databases that are not backed up by RMAN. Just as with SQL Server, a variety of approaches that depend on the needs of the DBA(s), including RMAN scheduled backups, is entirely appropriate.

Again, just as with SQL Server, it is important to assess the needs of the particular database and make some determinations regarding the criticality of the data, the requirement for granularity in restore, and the required RPO/RTO (recovery point objective/recovery time objective). Fully automated, RMAN backups to local disk may be a good solution for DBAs who require the need to very quickly restore data to a point in time, without waiting for backup media. After this type of backup is taken, it is possible to perform a backup software backup of the resulting RMAN backup. However, the same issues of the ability to track the backup and its status as it relates to other issues in the environment remain, if the second pass method is used.

RMAN provides a solution to the problem. Instead of having RMAN define the backup target and provide the interface, create a local backup disk target within the backup software (essentially make the database server its own media writer) and allow user-directed backups to the disk target. Then duplicate copies of a now backup-software-tracked backup can be made by either using Vault/bpduplicate (in the case of NetBackup) or an AUX copy process (in the case of CommVault). Very short-term retention periods can be set using this method, again working with the DBA(s) to determine how many copies they are required to keep, which allows the backup software to automatically maintain the backup space, offloading that task from the DBAs as well.

During a restore, the backup media is always available (it is the local disk). If not, RMAN will seamlessly request the backup from the backup software. It will provide the appropriate media stream if it is currently available or throw an alert to the backup administrator to retrieve the media and make it available for recovery.

However, this is an expensive process—there are additional licenses required for the media writer implementation, along with the disk storage required to physically store the backups that are generated by the process. So this method should only be used for critical systems that really require very tight restore timelines.

Sounds great! So, why can't this process be used for SQL Server? While it could in theory, the integration between the SQL Server and either NetBackup or CommVault is as seamless as RMAN—a restore call from the native SQL utility only recognizes backups generated by SQL Server—the restore is not seamlessly passed to the backup software for restoration, regardless of location. The RMAN interface allows for this seamless interface through the use of the software "library"-level integration—that pipe that moves the data from RMAN to the backup software for tracking, storage, and retention maintenance.

The alternative is to simply have RMAN backup to the standard backup media, as defined with the backup software. While the backup is not "local," it is still fully accessible from the RMAN recovery methods, it is simply located on media that is not on the same physical system. There are several issues that DBA(s) cite when presented with this method:

The performance of the restore will be slow—it may be slower than the local restore, depending on a number of factors on the Media Server/MediaAgent side and the network connection used to perform the backup/restore of the database. If the initial backup is stored on a Disk Storage Unit (DSU)/magnetic library (MagLib) and duplicate copies are made for longer-term storage, then the overall performance will only be gated by the network.
The backup will not be accessible because the tape will be inaccessible for some reason. One way to avoid this argument is to use disk storage as the backup target instead of tape and then make a copy of the backup image to external storage. Since the backup will remain on storage that is directly attached to the media writer, the backup will be accessible for use at any time. As a side benefit, potentially more RMAN backups can be accommodated by more effectively using the total storage allocated for RMAN backups and centralizing it on one or two Media Servers/MediaAgents.
Backups/recoveries cannot be done at a specific time due to batch processing, database maintenance, and so on. Since the backups are running to disk, which the backups will then be duplicated to longer-term media as a secondary step, essentially the backups can be scheduled at almost any time. However, this does present challenges for the backup administrator in terms of resource allocation and network congestion.

Using standard media on a "remote" Media Server/MediaAgent provides the ability to fully manage the complete backup process without sacrificing DBA direct access to the backups for operational purposes.

An alternative to using RMAN backups is to create a snapshot copy of the database. This requires that the database and all related files, including control files and configuration files, reside on disk that has the ability to have snapshots. These snapshots can be array-based snapshots, volume manager-snapshots (through the use of software on the operating system), or (in the case of Windows) a VSS snapshot. The procedure for taking backup copies based on snapshot is as follows:

Quiesce the database, either as a hot or cold database.
Create the snapshot.
Restart or remove the database from hot backup mode.
Mount the snapshot database.
Perform the backup.

The advantage of doing Oracle database backups in this way is that a complete copy of the database is made during the backup process. This copy can be used for reporting, testing, and development; extraction for data warehousing; and just about any other means needed to have a replica database available. The disadvantage is, of course, the requirement for a relatively large amount of storage to perform the snapshot—either the amount needed for the change when using snaps or the complete size of the database when using a clone.

Oracle provides a great deal of flexibility when it comes to protection of the data within the database. Through the use of RMAN, Oracle can both meet the needs of the DBA for operational protection and the needs of the backup administrator in strategic protection of the data over time.