使用exists、not eixsts不能用到semi-join、anti-join的一些情况讨论
在Oralce中我们经常会使用exists、not exists来进行一些表连接操作,在普通情况下Oracle的
执行计划如果使用到semi-join、anti-join方式来进行表连接的话,速度一遍很快,否则很慢。通 过使用提示hash_sj hash_aj也可使执行计划走semi-join、anti-join。 但是在有些情况下,即使增加了提示,或者对表进行了分析后,执行计划也不走semi-join、anti-join 方式,下面就是对这些情况的讨论。 附注:本文中的例子均是在Oracle9.2.0.6 for linux中测试的。 1、在where条件中多个exists、not exists是or的关系 SQL> create table a1 as select * from dba_objects where rownum <= 10000; SQL> create table a2 as select * from (select t.*,rownum rd from dba_objects t) p where rd>5000 and rd<= 15000; SQL> create table a3 as select * from (select t.*,rownum rd from dba_objects t) p where rd>7500 and rd<= 17500; SQL> analyze table a1 compute statistics; SQL> analyze table a2 compute statistics; SQL> analyze table a3 compute statistics; SQL> set autotrace on exp SQL> set timing on ----- exists or exists 情况 SQL> select a1.owner,count(*) 2 from a1 3 where exists 4 (select /*+ hash_sj(a1 a2) */* from a2 5 where a2.object_id=a1.object_id) 6 or 7 exists 8 (select /*+ hash_sj(a1 a2) */* from a3 9 where a3.object_id=a1.object_id) 10 group by a1.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 5000 已用时间: 00: 00: 26.66 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7) 1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825) 4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4) 5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4) 执行计划中显然没有用到semi-jobin,这时候只有改写SQL,通过union all的方式来达到使用到 semi-join的目的。 SQL> select x.owner,count(*) 2 from (select * from a1 3 where exists 4 (select /*+ hash_sj(a1 a2)*/* from a2 5 where a2.object_id= a1.object_id ) 6 union all 7 select * from a1 8 where not exists 9 (select /*+ hash_aj(a1 p)*/* from a2 p 10 where p.object_id=a1.object_id) 11 and exists 12 (select /*+ hash_sj(a1 a3)*/* from a3 13 where a3.object_id=a1.object_id) 14 ) x 15 group by x.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 5000 已用时间: 00: 00: 00.31 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=10001 Bytes= 170017) 1 0 SORT (GROUP BY) (Cost=90 Card=10001 Bytes=170017) 2 1 VIEW (Cost=88 Card=10001 Bytes=170017) 3 2 UNION-ALL 4 3 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000) 5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 6 4 VIEW OF 'VW_SQ_2' (Cost=17 Card=10000 Bytes=130000) 7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000Bytes=40000) 8 3 HASH JOIN (ANTI) (Cost=53 Card=1 Bytes=24) 9 8 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000) 10 9 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 11 9 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000) 12 11 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000) 13 8 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) ----- not exists or not exists 情况 SQL> select a1.owner,count(*) 2 from a1 3 where not exists 4 (select /*+ hash_sj(a1 a2) */* from a2 5 where a2.object_id=a1.object_id) 6 or 7 not exists 8 (select /*+ hash_sj(a1 a2) */* from a3 9 where a3.object_id=a1.object_id) 10 group by a1.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 7500 已用时间: 00: 00: 20.72 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7) 1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825) 4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4) 5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4) SQL> select x.owner,count(*) 2 from (select * from a1 3 where not exists 4 (select /*+ hash_sj(a1 a2)*/* from a2 5 where a2.object_id= a1.object_id ) 6 union all 7 select * from a1 8 where exists 9 (select /*+ hash_aj(a1 p)*/* from a2 p 10 where p.object_id=a1.object_id) 11 and not exists 12 (select /*+ hash_sj(a1 a3)*/* from a3 13 where a3.object_id=a1.object_id) 14 ) x 15 group by x.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 7500 已用时间: 00: 00: 00.31 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=89 Card=2 Bytes=34) 1 0 SORT (GROUP BY) (Cost=89 Card=2 Bytes=34) 2 1 VIEW (Cost=88 Card=2 Bytes=34) 3 2 UNION-ALL 4 3 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11) 5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 6 4 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) 7 3 HASH JOIN (SEMI) (Cost=53 Card=1 Bytes=24) 8 7 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11) 9 8 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 10 8 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000) 11 7 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000) 12 11 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) ----- not exists or exists 情况 SQL> select a1.owner,count(*) 2 from a1 3 where not exists 4 (select /*+ hash_sj(a1 a2) */* from a2 5 where a2.object_id=a1.object_id) 6 or exists 7 (select /*+ hash_sj(a1 a3) */* from a3 8 where a3.object_id=a1.object_id) 9 group by a1.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 7500 已用时间: 00: 00: 20.57 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=7) 1 0 SORT (GROUP BY) (Cost=18 Card=1 Bytes=7) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=975 Bytes=6825) 4 2 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4) 5 2 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=1 Bytes=4) SQL> select x.owner,count(*) 2 from (select * from a1 3 where not exists 4 (select /*+ hash_sj(a1 a2)*/* from a2 5 where a2.object_id= a1.object_id ) 6 union all 7 select * from a1 8 where exists 9 (select /*+ hash_aj(a1 p)*/* from a2 p 10 where p.object_id=a1.object_id) 11 and exists 12 (select /*+ hash_sj(a1 a3)*/* from a3 13 where a3.object_id=a1.object_id) 14 ) x 15 group by x.owner; OWNER COUNT(*) ------------------------------ ---------- SYS 7500 已用时间: 00: 00: 00.32 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=10001 Bytes= 170017) 1 0 SORT (GROUP BY) (Cost=90 Card=10001 Bytes=170017) 2 1 VIEW (Cost=88 Card=10001 Bytes=170017) 3 2 UNION-ALL 4 3 HASH JOIN (ANTI) (Cost=35 Card=1 Bytes=11) 5 4 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 6 4 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) 7 3 HASH JOIN (SEMI) (Cost=53 Card=10000 Bytes=330000) 8 7 HASH JOIN (SEMI) (Cost=35 Card=10000 Bytes=200000) 9 8 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=70000) 10 8 VIEW OF 'VW_SQ_2' (Cost=17 Card=10000 Bytes=130000) 11 10 TABLE ACCESS (FULL) OF 'A3' (Cost=17 Card=10000 Bytes=40000) 12 7 VIEW OF 'VW_SQ_1' (Cost=17 Card=10000 Bytes=130000) 13 12 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) 从上面的例子可以看出速度的提高是非常明显的,当然对于or的情况越多,修改后的SQL也就越复杂,不过相对于效率 的提高来说还是很值得的。 2、如果子句中含有group by这种排序操作,使用in来代替exists操作会大大提高效率,示例如下: 修改一下示例表: SQL> insert into a2 select * from a2; SQL> commit; <1> 使用exists操作,无论添加什么提示都不能使其走semi-join,耗费大量时间 SQL> select count(*) from a1 a 2 where exists 3 ( select /*+ hash_sj(a b) */* 4 from ( select object_id from a2 5 group by object_id having count(*) > 1 6 ) b 7 where a.object_id=b.object_id 8 ); COUNT(*) ---------- 5000 已用时间: 00: 00: 46.22 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=1 Bytes=4) 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=500 Bytes=2000) 4 2 FILTER 5 4 SORT (GROUP BY NOSORT) (Cost=17 Card=1 Bytes=4) 6 5 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4) <2> 将exists改为in句型,就可以用到semi-join,效率大大提高 SQL> select count(*) from a1 a where object_id in 2 ( select object_id from a2 b 3 group by object_id having count(*) > 1 4 ); COUNT(*) ---------- 5000 已用时间: 00: 00: 00.90 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=1 Bytes=17) 1 0 SORT (AGGREGATE) 2 1 HASH JOIN (SEMI) (Cost=37 Card=500 Bytes=8500) 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=40000) 4 2 VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500) 5 4 FILTER 6 5 SORT (GROUP BY) (Cost=19 Card=500 Bytes=2000) 7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) 对于not exists情况也是同样的,需要将语句改写为not in,但是要加上提示以及关联的列要有非空的限制 才能使其走anti-join的执行计划。 SQL> select count(*) from a1 a 2 where not exists 3 ( select /*+ hash_aj(a b)*/* from 4 ( select object_id from a2 5 group by object_id having count(*) > 1 6 ) b 7 where a.object_id=b.object_id 8 ); COUNT(*) ---------- 5000 已用时间: 00: 00: 46.47 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=1 Bytes=4) 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=500 Bytes=2000) 4 2 FILTER 5 4 SORT (GROUP BY NOSORT) (Cost=17 Card=1 Bytes=4) 6 5 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=1 Bytes=4) SQL> select count(*) from a1 a 2 where object_id is not null 3 and object_id not in 4 ( select object_id from a2 b 5 where object_id is not null 6 group by object_id having count(*) > 1 7 ); COUNT(*) ---------- 5000 已用时间: 00: 00: 00.75 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=1 Bytes=17) 1 0 SORT (AGGREGATE) 2 1 HASH JOIN (ANTI) (Cost=37 Card=9500 Bytes=161500) 3 2 TABLE ACCESS (FULL) OF 'A1' (Cost=17 Card=10000 Bytes=40000) 4 2 VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500) 5 4 FILTER 6 5 SORT (GROUP BY) (Cost=19 Card=500 Bytes=2000) 7 6 TABLE ACCESS (FULL) OF 'A2' (Cost=17 Card=10000 Bytes=40000) |
|