一本面向 Oracle 运行的好书《Oracle 24x7 Tips and Techniques》（Venkat S. Devraj, Osborne/McGraw-Hill, November 1

Stefen 2010-07-14

展开全文

这本99年的书，作者根据自己咨询经验，总结提炼了很多 Tips（窍门），如果你生活在高可用运行环境中，你会碰到那些作者提到的问题，作者不保守，乐于分享知识

图书英文目录 （Table of contents）

Part I: Introduction.

Chapter 1. Identifying our Uptime Requirements.

Understanding What 24X7 Means for Your Organization

Working Toward an "Ideal" 24X7 System

Tips for Semi-Technical Managers and Supervisors

Summary

Chapter 2. Understanding and Dealing with Emergencies.

What is an Emergency?

Summary

Part II: Understanding Your Environment.

Chapter 3. Hardware Configuration.

Hardware Configuration

Know real well what's inside the box

choose your disk-array size with caution

Do Not Use Read-Ahead Caches for OLTP Applications

Do Not Rely on Write Caches to Eliminate I/O Hotspots

Use multi-level RAIDs

Ensure Your Stripe-Sizes in line with Your OS and Database Block-Sizes

Ensure that your disk and tape I/O sizes match

Architectural Alternatives for System Redundancy and Performance

Summary

Chapter 4. Operating Systems.

Oracle on Unix Versus Oracle on Windows NT

Kernel and Block Size

Learn to customize your kernel

Know your OS logical and physical block sizes

Raw Devices

......

Pre-create abundant raw partitions

Choose multiple standard sizes for raw partitions

Do not use cylinder 0 of the disk for creating raw partitions

Create symbolic links for all raw devices

Be aware of other file system options

System Capacity and Bottlenecks

Proactively ensure that system capacity is not reached at any time

Distribute all resource-intensive applications among multiple servers

Set up limits on CPU and memory use for non-database servers

Discourage non-production jobs during peak production hours

Avoid resource-intensive commands that compete with Oracle processes

Do not set priority levels for Oracle-related processes

Do not use processor affinity on the database server

Check frequently for memory leaks

Set swap space between two and four times larger than physical memory

Distribute swap area across your fast disks

Know whether your OS can address more than 2GB of RAM

If possible, lock your shared-memory area in physical memory

Understand logical and physical drives mapping

Enable file system journaling on all production boxes

Check the availability of free disk space on a regular basis

Keep your file systems and directories slim

Maintain a spare root file system

If available, turn on large file support

Look at important OS logs at regular intervals

Invest in automated tools to monitor system bottlenecks

Chapter 5. Networks.

Managing your network

......

Ensure that your network is not being overloaded

Configure your network to use sub-networking effectively

Customizing SQL*Net and Net8 for Your Network Configuration

Increase the network queue size

Disable the NAGLE algorithm

Set the SQL*Net or Net8 packet size to match the protocol MTU

Deploy homogeneous servers throughout your enterprise

Tips for Semi-Technical Managers and Supervisors

Chapter 6. Applications and Data.

Applications

......

Make application code independent of future schemetic changes

Evaluate using transaction splitting to clone data

Evaluate using pro*C or OCI for all mission-critical applications

Implement failover capabilities within all mission-critical applications

Be familiar with various automatic failover options

Have MAXEXTENTS unlimited for all volatile application segments

Expedite your data loads （加速数据装载过程）

Version control your application source code

Manage your indexes efficiently

Use threads,rather than processes,in your applications

Use shared libraries in your applications

Invest time in physical database design

Some quick application-realted tips

Data

Take time to understand the schema model

Categorize your data

Part III: Database Setup and Configuration.

Chapter 7. Installing, Configuring and Customizing our Database Environment.

Server Configuration

Follow the OFA standard

Always use a config.ora file

Use the crdb_SID.sql and crdb2_SID.sql creation scripts

Ensure sufficient disk space for at least two weeks' worth of trace files and alert-log

Consider OS block size and application characteristics when setting the database block size

Ensure that contiguous（相邻） blocks per write are set high

For a large number of data-files in Oracle7.x, enable CKPT

Use LOG_CHECKPOINT_TIMEOUT and/or LOG_CHECKPOINT_INTERVAL to ensure instance-recovery times specified in your SLA are met

Create more than three mirrored online redo-log groups for high-actvity databases

Place redo-logs such that ARCH and LGWR do not contend with each other

Configure adequate redo latches to minimize contention

Ensure that dbwr can keep up with database loads

Partition your buffer cache into multiple buffer pools according to segment usage patterns

Use process slaves to alleviate I/O bottlenecks during high database activity

Enable vector posting

Set up dedicated temporary tablespaces

Set up sort area efficiently

Use direct writes for sorts

Use profiles to prevent runaway processes from hogging all resources

Evaluate using vectorized reads for cooked file systems

Enable direct I/O when using cooked file systems

Use intimate shared memory（ISM）for databases with high concurrent user access

Use post-wait drivers for high IPC thrughout

Pre-page and "lock" the SGA in main memory

Always install SQL*Plus help

Other Self-Explanatory（Well,Almost）Server Configuration Tips

SQL*Net/Net8 Configuration

Use Oracle names when dealing with a large number of clients

Use out-of-band breaks with SQL*Net/Net8 or set poll frequency high when out-of-band breaks are unavailable

Keep dead-connection detection to a minimum

Manage large user populations effectively via SQL*Net and Net8 features

Pre-spawn dedicated server processes

Chapter 8. Database Upgrades, Downgrades, Reorganizations, and Migrations.

Upgrades

Downgrades

Reorganizations

Migrations

When to Upgrade

Practice the migration to a test environment

Analyze each step and construct strategies to expedite it

Gotchas to watch for while upgrading from Oracle7 to Oracle8/8i

Quick tips to enhance load/unload performance

Real-Life Migration Case Study

Chapter 9. Backup, Recover, and Archiving Policies and Procedures.

Backups

Select at least two backup methods after adequate analysis

Perform hot backups during period of low DML activity

Do not place all tablespaces in hot backup mode simultaneously

If possible, back up to disk first and then to tape

Do not back up online redo-logs during OS hot backups

Evaluate the use of tripe mirrors

Make sure that hot backup commands are executed synchronously

Check the backup logs regularly

Avoid performing backups manually

Ensure that security is not compromised when using backup tools

With RMAN, periodically resynchronize the recovery catalog

Ensure that your backup strategy accounts for all Oracle file-types

Back up the control-files immediately after all database changes

Address export-related comcerns before you start performing exports

Form "export-groups" for inter-segment consistency

Plan sufficient space for export dumps

Perform exports using direct path, rather than conventional path

Ensure compatibility and usability of all backed-up export-dumps

Perform a backup immediately after any system clock changes

Check for database corruption during and after backups

Set optimal I/O size during backups

Make use of incremental backups available in Oracle8

Archiving

Evaluate whether ARCHIVELOG is necessary for your environment

Implement antomatic archival as the primary archival mode

Allocate adequate space for ARCHIVE_LOG_DEST

........

Take a complete backup as soon as ARCHIVELOG mode is enabled

Use ARCHIVE LOG CURRENT rather than a variation

Do not create holes in the archived log sequence

If possible, keep dual copies of archived logs on disk

Create more online redo-log groups when archiving to offline media

Be aware of special archiving considerations, when using OPS

Recovery

Understand the need for recovery and take proactive steps to avoid failure

Understand factors affecting recovery time

Ensure that recovery timings are consistent with your SLA MTTR specifications

Consider alternative backup/recovery approaches for point-in-time recovery

Ensure that checkpoints occur at sufficient intervals and redo-logs are sized appropriately to prevent recovery delays

Always consider the lowest unit of recovery

Always examine v$datafile prior to recovery

Maintain a special init.ora file for recovery operations

Always use RESETLOGS as the last resort

Evaluate all UNRECOVERABLE and NOLOGGING operations

Catergorize all segments and tablespaces for recovery

Keep tablespaces without write activity READONLY to expedite backup and recovery

Set AUTORECOVERY on during recovery

Tips for semi-technical managers and supervisors

.......

Document all database structural changes

Chapter 10. Startup and Shutdown Procedures.

Startup and Shutdown Procedures

Always Automate Database Startup and Shutdown

Use OEM for Starting UP/Shutting Down Multiple Databases

For Urgent Database Bounce Requirements, Consider a "Shutdown Abort"

Just Prior to a SHUTDOWN ABORT do an ALTER SYSTEM CHECKPOINT

Part IV: Database Maintenance.

Chapter 11. General Maintenance.

Proactive and Reactive Maintenance

Implement robust security at all possible levels

Understand the "ORA-1555: Snapshot too old" error and take steps to avoid it

Understand, prevent, and resolve latch and enqueue contention

Be familiar with different wait events and take steps to reduce them

Periodically watch for and resolve locking conflicts

Monitor shared pool effectiveness and tune, if necessary

Other Critical Areas That Your General Maintenance Routines Need to Address

Chapter 12. Space and Growth Management.

Understanding and managing space and growth

Understand the business drivers influencing segment growth

Understand the units of space consumption

Understand the various segments that comprise your database

Be familiar with the errors and effects that result from inadequate space management

Enable the AUTOEXTEND feature for all application tablespaces and SYSTEM

Use standardized file-sizes

Accommodate the data-file header block in your standardized file-sizes

Use standardized storage clauses for each segment-type to reduce fragmentation and use space more optimally

Set INITIAL and NEXT keeping DB_BLOCK_SIZE in mind

Set PCTINCREASE optimally to reduce fragmentation

Understand FREELISTS and set them optimally to reduce latch contention

Periodically coalesce your application tablespaces

Understand the effect of an erroneous high-water mark

Periodically deallocate unused blocks

Avoid dropping data-files explicitly

Learn the "tablespace shuffle" for emergency unanticipated space requirements

In a DSS environment, do not blindly accept unrestricted segment growth

Part V: Troubleshooting.

Chapter 13. Interpreting the Alert-Log and Trace-Files.

The Alert-Log

Whenever possible, continually tail the alert-log

Be familiar with events/messages in the alert-log

Edit repetitive error messages to be listed just once

Archive and trim the alert-log once every two weeks to three months

Trace-Files

Know what to look for in a trace-file

Know how to identify trace-file belonging to a specific event/session

Know what causes trace-files to be generated

Tracing

SQL*Net and Net8 Tracing

PL/SQL Tracing

ODBC Tracing

Core Dumps

Oracle Debugging Tools

Whenever necessary, make your trace-files readable by developers - 让开发人员拥有读到跟踪文件的权限

Chapter 14. Identifying and Repairing Database Corruption.

Understanding, Preventing, and Repairing Corruption

Understand corruption and the ways it can manifest itself

Proactively check for and detect media corruption

Understand what you can do to prevent media corruption

Be familiar with techniques to deal with memory corruption

Be familiar with techniques to deal with logical corruption

Identify and list all components susceptible to corruption

Identify and list measures to prevent, detect and repair corruption in your environment

Retain relevant information regarding the corruption

Train your administrators to handle corruption

Part VI: High Availability Solutions.

Chapter 15. Standby Options.

Standby Options from Oracle

Consider using a standby database for disaster recovery

Consider using standby instances in environments where true 24X7 uptime is not mandatory

Consider combining multiple solution to form a comprehensive one - 多方案组合

Strategies for availability During Routine Maintenance and Emergencies - 例行维护和紧急情况下的可用性策略

Evaluate using a Custom Standby Database（CSD）

Consider using standby tables to cater to segment-level failures

Be aware of caveats with the custom solutions desccribed here

Chapter 16. Oracle Parallel Server.

Understanding and managing an OPS environment

Know what OPS is and how it enhences availability

Understand, detect, and eliminate false pinging

Partitioning applications prior to deploying them on OPS

Chapter 17. Advanced Replication.

Understanding AR

Be familiar with AR operations and basic functioning

Understand your options for highly available AR implementations

Be aware of possible AR architectural alternatives

Chapter 18. Third-Party HA Solutions.

Hardware/OS Based

Consider EMC SRDF for disaster-recovery purposes

Evaluate using EMC TIMEFINDER for higher return on investment

Database-Based

Consider Quest Software's Shareplex for Oracle for higher standby flexibility and concurrent reporting access

Part VII: Building A Real-World Arsenal.

Chapter 19. The 24x7 Toolkit.

Using Scripts to Be Alerted to Disruptive Problems

Monitor for specific failures by running poll-scripts periodically

Be familiar with techniques for application failover when calling PL/SQL

Part VIII: New Features.

Chapter 20. New HA Features in Oracle8i.

Maintenance-Oriented Features

Consider transportable tablespaces for quick movement of data subsets

Consider locally managed tablespaces to prevent fragmentation

Be familiar with new change management features

Create, rebuild and defragment indexes online

Be familiar with non-partitioned table reorganization features

Drop columns, when necessary, with greater ease

Troubleshooting Features

Be familiar with new techniques to repair corruption

Backup and Recovery-Oriented Features

Utilize methods to facilitate and expedite RMAN backups

Control and expedite recovery time

Miscellaneous HA Features

Use newer listener failover and load-balancing services

Be familiar with LogMiner functionality

Be aware of an option to eliminate "fat" OS overhead

A: References.

Index

本站是提供个人知识管理的网络存储空间，所有内容均由用户发布，不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息，谨防诈骗。如发现有害或侵权内容，请点击一键举报。

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自： Stefen > 《IT Operations Management》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

Stefen

关注对话

TA的最新馆藏

[转] 懒兔子第十七讲便秘之热秘的辨症治疗
冰心诗歌
AIX 中 hostname 命令与 uname -n 命令的区别
冯大辉：十年莽撞IT路（程序员杂志，2011-02-16）
未来这五年，软件工程师如何安身立命？（上） ——Ruby之父松本行弘谈《代码的未来》
未来这五年，软件工程师如何安身立命？（下） ——Ruby之父松本行弘谈《代码的未来》

喜欢该文的人也喜欢更多

热门阅读换一换