这本99年的书,作者根据自己咨询经验,总结提炼了很多 Tips(窍门),如果你生活在高可用运行环境中,你会碰到那些作者提到的问题,作者不保守,乐于分享知识 图书英文目录 (Table of contents) Part I: Introduction. Chapter 1. Identifying our Uptime Requirements. Understanding What 24X7 Means for Your Organization Working Toward an "Ideal" 24X7 System Tips for Semi-Technical Managers and Supervisors Summary Chapter 2. Understanding and Dealing with Emergencies. What is an Emergency? Summary Part II: Understanding Your Environment. Chapter 3. Hardware Configuration. Hardware Configuration Know real well what's inside the box choose your disk-array size with caution Do Not Use Read-Ahead Caches for OLTP Applications Do Not Rely on Write Caches to Eliminate I/O Hotspots Use multi-level RAIDs Ensure Your Stripe-Sizes in line with Your OS and Database Block-Sizes Ensure that your disk and tape I/O sizes match Architectural Alternatives for System Redundancy and Performance Summary Chapter 4. Operating Systems. Oracle on Unix Versus Oracle on Windows NT Kernel and Block Size Learn to customize your kernel Know your OS logical and physical block sizes Raw Devices ...... Pre-create abundant raw partitions Choose multiple standard sizes for raw partitions Do not use cylinder 0 of the disk for creating raw partitions Create symbolic links for all raw devices Be aware of other file system options System Capacity and Bottlenecks Proactively ensure that system capacity is not reached at any time Distribute all resource-intensive applications among multiple servers Set up limits on CPU and memory use for non-database servers Discourage non-production jobs during peak production hours Avoid resource-intensive commands that compete with Oracle processes Do not set priority levels for Oracle-related processes Do not use processor affinity on the database server Check frequently for memory leaks Set swap space between two and four times larger than physical memory Distribute swap area across your fast disks Know whether your OS can address more than 2GB of RAM If possible, lock your shared-memory area in physical memory Understand logical and physical drives mapping Enable file system journaling on all production boxes Check the availability of free disk space on a regular basis Keep your file systems and directories slim Maintain a spare root file system If available, turn on large file support Look at important OS logs at regular intervals Invest in automated tools to monitor system bottlenecks Chapter 5. Networks. Managing your network ...... Ensure that your network is not being overloaded Configure your network to use sub-networking effectively Customizing SQL*Net and Net8 for Your Network Configuration Increase the network queue size Disable the NAGLE algorithm Set the SQL*Net or Net8 packet size to match the protocol MTU Deploy homogeneous servers throughout your enterprise Tips for Semi-Technical Managers and Supervisors Chapter 6. Applications and Data. Applications ...... Make application code independent of future schemetic changes Evaluate using transaction splitting to clone data Evaluate using pro*C or OCI for all mission-critical applications Implement failover capabilities within all mission-critical applications Be familiar with various automatic failover options Have MAXEXTENTS unlimited for all volatile application segments Expedite your data loads (加速数据装载过程) Version control your application source code Manage your indexes efficiently Use threads,rather than processes,in your applications Use shared libraries in your applications Invest time in physical database design Some quick application-realted tips Data Take time to understand the schema model Categorize your data Part III: Database Setup and Configuration. Chapter 7. Installing, Configuring and Customizing our Database Environment. Server Configuration Follow the OFA standard Always use a config.ora file Use the crdb_SID.sql and crdb2_SID.sql creation scripts Ensure sufficient disk space for at least two weeks' worth of trace files and alert-log Consider OS block size and application characteristics when setting the database block size Ensure that contiguous(相邻) blocks per write are set high For a large number of data-files in Oracle7.x, enable CKPT Use LOG_CHECKPOINT_TIMEOUT and/or LOG_CHECKPOINT_INTERVAL to ensure instance-recovery times specified in your SLA are met Create more than three mirrored online redo-log groups for high-actvity databases Place redo-logs such that ARCH and LGWR do not contend with each other Configure adequate redo latches to minimize contention Ensure that dbwr can keep up with database loads Partition your buffer cache into multiple buffer pools according to segment usage patterns Use process slaves to alleviate I/O bottlenecks during high database activity Enable vector posting Set up dedicated temporary tablespaces Set up sort area efficiently Use direct writes for sorts Use profiles to prevent runaway processes from hogging all resources Evaluate using vectorized reads for cooked file systems Enable direct I/O when using cooked file systems Use intimate shared memory(ISM)for databases with high concurrent user access Use post-wait drivers for high IPC thrughout Pre-page and "lock" the SGA in main memory Always install SQL*Plus help Other Self-Explanatory(Well,Almost)Server Configuration Tips SQL*Net/Net8 Configuration Use Oracle names when dealing with a large number of clients Use out-of-band breaks with SQL*Net/Net8 or set poll frequency high when out-of-band breaks are unavailable Keep dead-connection detection to a minimum Manage large user populations effectively via SQL*Net and Net8 features Pre-spawn dedicated server processes Chapter 8. Database Upgrades, Downgrades, Reorganizations, and Migrations. Upgrades Downgrades Reorganizations Migrations When to Upgrade Practice the migration to a test environment Analyze each step and construct strategies to expedite it Gotchas to watch for while upgrading from Oracle7 to Oracle8/8i Quick tips to enhance load/unload performance Real-Life Migration Case Study Chapter 9. Backup, Recover, and Archiving Policies and Procedures. Backups Select at least two backup methods after adequate analysis Perform hot backups during period of low DML activity Do not place all tablespaces in hot backup mode simultaneously If possible, back up to disk first and then to tape Do not back up online redo-logs during OS hot backups Evaluate the use of tripe mirrors Make sure that hot backup commands are executed synchronously Check the backup logs regularly Avoid performing backups manually Ensure that security is not compromised when using backup tools With RMAN, periodically resynchronize the recovery catalog Ensure that your backup strategy accounts for all Oracle file-types Back up the control-files immediately after all database changes Address export-related comcerns before you start performing exports Form "export-groups" for inter-segment consistency Plan sufficient space for export dumps Perform exports using direct path, rather than conventional path Ensure compatibility and usability of all backed-up export-dumps Perform a backup immediately after any system clock changes Check for database corruption during and after backups Set optimal I/O size during backups Make use of incremental backups available in Oracle8 Archiving Evaluate whether ARCHIVELOG is necessary for your environment Implement antomatic archival as the primary archival mode Allocate adequate space for ARCHIVE_LOG_DEST ........ Take a complete backup as soon as ARCHIVELOG mode is enabled Use ARCHIVE LOG CURRENT rather than a variation Do not create holes in the archived log sequence If possible, keep dual copies of archived logs on disk Create more online redo-log groups when archiving to offline media Be aware of special archiving considerations, when using OPS Recovery Understand the need for recovery and take proactive steps to avoid failure Understand factors affecting recovery time Ensure that recovery timings are consistent with your SLA MTTR specifications Consider alternative backup/recovery approaches for point-in-time recovery Ensure that checkpoints occur at sufficient intervals and redo-logs are sized appropriately to prevent recovery delays Always consider the lowest unit of recovery Always examine v$datafile prior to recovery Maintain a special init.ora file for recovery operations Always use RESETLOGS as the last resort Evaluate all UNRECOVERABLE and NOLOGGING operations Catergorize all segments and tablespaces for recovery Keep tablespaces without write activity READONLY to expedite backup and recovery Set AUTORECOVERY on during recovery Tips for semi-technical managers and supervisors ....... Document all database structural changes Chapter 10. Startup and Shutdown Procedures. Startup and Shutdown Procedures Always Automate Database Startup and Shutdown Use OEM for Starting UP/Shutting Down Multiple Databases For Urgent Database Bounce Requirements, Consider a "Shutdown Abort" Just Prior to a SHUTDOWN ABORT do an ALTER SYSTEM CHECKPOINT Part IV: Database Maintenance. Chapter 11. General Maintenance. Proactive and Reactive Maintenance Implement robust security at all possible levels Understand the "ORA-1555: Snapshot too old" error and take steps to avoid it Understand, prevent, and resolve latch and enqueue contention Be familiar with different wait events and take steps to reduce them Periodically watch for and resolve locking conflicts Monitor shared pool effectiveness and tune, if necessary Other Critical Areas That Your General Maintenance Routines Need to Address Chapter 12. Space and Growth Management. Understanding and managing space and growth Understand the business drivers influencing segment growth Understand the units of space consumption Understand the various segments that comprise your database Be familiar with the errors and effects that result from inadequate space management Enable the AUTOEXTEND feature for all application tablespaces and SYSTEM Use standardized file-sizes Accommodate the data-file header block in your standardized file-sizes Use standardized storage clauses for each segment-type to reduce fragmentation and use space more optimally Set INITIAL and NEXT keeping DB_BLOCK_SIZE in mind Set PCTINCREASE optimally to reduce fragmentation Understand FREELISTS and set them optimally to reduce latch contention Periodically coalesce your application tablespaces Understand the effect of an erroneous high-water mark Periodically deallocate unused blocks Avoid dropping data-files explicitly Learn the "tablespace shuffle" for emergency unanticipated space requirements In a DSS environment, do not blindly accept unrestricted segment growth Part V: Troubleshooting. Chapter 13. Interpreting the Alert-Log and Trace-Files. The Alert-Log Whenever possible, continually tail the alert-log Be familiar with events/messages in the alert-log Edit repetitive error messages to be listed just once Archive and trim the alert-log once every two weeks to three months Trace-Files Know what to look for in a trace-file Know how to identify trace-file belonging to a specific event/session Know what causes trace-files to be generated Tracing SQL*Net and Net8 Tracing PL/SQL Tracing ODBC Tracing Core Dumps Oracle Debugging Tools Whenever necessary, make your trace-files readable by developers - 让开发人员拥有读到跟踪文件的权限 Chapter 14. Identifying and Repairing Database Corruption. Understanding, Preventing, and Repairing Corruption Understand corruption and the ways it can manifest itself Proactively check for and detect media corruption Understand what you can do to prevent media corruption Be familiar with techniques to deal with memory corruption Be familiar with techniques to deal with logical corruption Identify and list all components susceptible to corruption Identify and list measures to prevent, detect and repair corruption in your environment Retain relevant information regarding the corruption Train your administrators to handle corruption Part VI: High Availability Solutions. Chapter 15. Standby Options. Standby Options from Oracle Consider using a standby database for disaster recovery Consider using standby instances in environments where true 24X7 uptime is not mandatory Consider combining multiple solution to form a comprehensive one - 多方案组合 Strategies for availability During Routine Maintenance and Emergencies - 例行维护和紧急情况下的可用性策略 Evaluate using a Custom Standby Database(CSD) Consider using standby tables to cater to segment-level failures Be aware of caveats with the custom solutions desccribed here Chapter 16. Oracle Parallel Server. Understanding and managing an OPS environment Know what OPS is and how it enhences availability Understand, detect, and eliminate false pinging Partitioning applications prior to deploying them on OPS Chapter 17. Advanced Replication. Understanding AR Be familiar with AR operations and basic functioning Understand your options for highly available AR implementations Be aware of possible AR architectural alternatives Chapter 18. Third-Party HA Solutions. Hardware/OS Based Consider EMC SRDF for disaster-recovery purposes Evaluate using EMC TIMEFINDER for higher return on investment Database-Based Consider Quest Software's Shareplex for Oracle for higher standby flexibility and concurrent reporting access Part VII: Building A Real-World Arsenal. Chapter 19. The 24x7 Toolkit. Using Scripts to Be Alerted to Disruptive Problems Monitor for specific failures by running poll-scripts periodically Be familiar with techniques for application failover when calling PL/SQL Part VIII: New Features. Chapter 20. New HA Features in Oracle8i. Maintenance-Oriented Features Consider transportable tablespaces for quick movement of data subsets Consider locally managed tablespaces to prevent fragmentation Be familiar with new change management features Create, rebuild and defragment indexes online Be familiar with non-partitioned table reorganization features Drop columns, when necessary, with greater ease Troubleshooting Features Be familiar with new techniques to repair corruption Backup and Recovery-Oriented Features Utilize methods to facilitate and expedite RMAN backups Control and expedite recovery time Miscellaneous HA Features Use newer listener failover and load-balancing services Be familiar with LogMiner functionality Be aware of an option to eliminate "fat" OS overhead A: References. Index |
|