小案例分享，11G新特性引发的严重性能问题【附AWR截图】

qrzhcd 2012-01-20

展开全文

一个Rac库，在从10g迁移到11g后，开始出现性能问题。
表现为同样的SQL，同样的执行计划，可是SQL耗费时从1秒飙升到几十秒。

做了AWR，发现了症状，整个系统的IO 已经接近崩溃。
350M/s的IO, 单次IO读的时间高达400ms。

看wait event，百分之85的IO是由于direct path read 引起的。
direct path read是11G引入的新特性。当有全表扫描时，oracle会根据一定的算法进行分析，如果发现走direct path read更优，就会选择走direct path read，取代传统的scattered read。
应该说这个特性还是不错的。

可下面就是问题了，有这么一个很简单的SQL，访问一个简单的表A，由于没有where filter，所有肯定需要走单表扫描。
这个SQL被大量执行，同一时刻有200个session在执行这个SQL。而oracle经过计算，对这个SQL选择了走direct path read。

然后悲剧来了。
direct path read不是问题。
可当200多个session同时做direct path read，就是个问题了。 IO系统直接崩溃。
并且直接导致其它做正常的scattered read和sequental read的session，当涉及到phsycial IO时，由于已经不堪重负的IO系统的拖累，所以，响应时间也几十倍的上升。

找到了原因后，解决方法就简单了，以下两种供参考：
1. 10049
2. alter table A cache

考虑到200多个session执行同一个SQL， alter table A cache后害怕会降低太多IO，造成library中的争用，引入新的风险，所以现在比较倾向10049 trace。

AWR截图，供参考。

Wait		Event		Wait Time			Summary Avg Wait Time (ms)
I#	Class	Event	Waits	%Timeouts	Total(s)	Avg(ms)	%DB time	Avg	Min	Max	Std Dev	Cnt
*	User I/O	direct path read	1,373,931	0.00	481,162.28	350.21	81.58	350.26	348.14	353.65	2.97	3
	Other	reliable message	32,847	21.54	9,074.56	276.27	1.54	369.12	152.35	495.44	188.58	3
	User I/O	db file sequential read	21,558	0.00	8,367.24	388.13	1.42	387.98	379.14	394.27	7.88	3
	System I/O	control file sequential read	26,328	0.00	5,801.76	220.36	0.98	218.09	211.52	228.97	9.49	3
	Commit	log file sync	13,094	0.00	5,346.94	408.35	0.91	447.49	251.90	658.57	203.78	3
		DB CPU			4,250.92		0.72					3
	User I/O	direct path write temp	12,742	0.00	3,505.86	275.14	0.59	399.88	124.09	539.61	238.85	3
	System I/O	log file parallel write	13,040	0.00	2,580.21	197.87	0.44	209.98	142.42	304.40	84.26	3
	System I/O	db file parallel write	3,896	0.00	800.70	205.52	0.14	208.68	172.50	256.24	43.01	3
	Other	gcs log flush sync	70,782	86.16	691.43	9.77	0.12	9.81	9.51	9.99	0.26	3

可以考虑cache到中间件上去，11G还有 result cache
在测试过程中也发现这个问题,都禁用了

成熟生产库，一般发现问题解决问题都是在现有框架内，不好随便引入新的东西

10949 还是 10049?

alter session/system set events '10949 level 1' ?

alter system set event= '10949 trace name context forever, level 1' scope=spfile;
alter system set audit_trail=none;

禁止sql tuning advisor
BEGIN
DBMS_AUTO_TASK_ADMIN.disable(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;

都建议禁掉