墨墨导读:由于执行计划中,对过滤谓词顺序的改变,导致SQL运行报错。 最近,遇到了一个关于ORA-01841的报错,起初,认为这个错误处理起来应该不困难,但实际上折腾了很久,才最终找到问题原因,并解决掉,下面将本次解决和分析的过程用样例来说明。 ORA-01841的错误提示是“(full) year must be between -4713 and +9999, and not be 0”,翻译过来,大意是完整的年份值需在-4712到+9999之间,并且不得为0。出现这个错误,通常都是数据本身存在问题导致的,但本案例中,又不仅仅是数据的问题。 下面就来回顾一下问题处理的过程。为了简化问题,方便理解,以下描述均是在事后构建的模拟环境中进行的: 执行以下SQL时,发生了ora-01841的报错: SQL> select * from ( select * from test_tab1 where c1 not like 'X%' ) where to_date(c1,'yyyy-mm-dd') > date'2020-11-01' ; ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected 结合SQL和报错信息,最初的怀疑是内层查询的结果集的C1列上,有不正常的数据,导致出现了报错。因此,首先检查内层查询的结果: SQL> select * from test_tab1 where c1 not like 'X%' ;
ID C1 ---------- -------------------------------- 1 2020-10-04 2 2020-09-17 3 2020-10-14 4 2020-11-03 5 2020-12-04 我们可以看到,内层查询的结果集中,并没有不正常的数据。 到此时,想了许久,也做了各种测试,但均没有找到问题原因。决定看一下执行计划: SQL> set autot on SQL> select * from ( select * from test_tab1 where c1 not like 'X%' ) where to_date(c1,'yyyy-mm-dd') > date'2020-11-01' ; ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("TEST_TAB1"."C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C1" NOT LIKE 'X%')
Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 419 bytes sent via SQL*Net to client 492 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
SQL> 从执行计划中看,CBO对该SQL做了自动改写,将外层查询的条件,推到了内层查询。而且,从谓词信息部分,我们可以看到SQL中的条件“to_date(c1,‘yyyy-mm-dd’) > date’2020-11-01’”在两个过滤条件中,是位于靠前的位置。 也就是说,当数据库对表中的数据做过滤时,是先用“to_date(c1,‘yyyy-mm-dd’) > date’2020-11-01’”来检查。这样,如果有某行数据的C1列中的值不正常,就会导致这样的报错。 我们来验证一下: SQL> select * from test_tab1;
ID C1 ---------- -------------------------------- 1 2020-10-04 2 2020-09-17 3 2020-10-14 4 2020-11-03 5 2020-12-04 6 XXXXXXXXX1
6 rows selected.
未被CBO自动改写的原始SQL,其内层查询,会将不能正常转换为日期的数据排除掉,然后在外层再去做TO_DATE的转换。如果CBO按照这种方式来处理,就不会报错了。 知道了原因,那我们要如何处理呢? SQL> select * from ( select t.*, rownum rn from test_tab1 t where c1 not like 'X%' ) where to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 4 5 6 7 8
ID C1 RN ---------- -------------------------------- ---------- 4 2020-11-03 4 5 2020-12-04 5
Execution Plan ---------------------------------------------------------- Plan hash value: 4134971776
--------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 220 | 3 (0)| 00:00:01 | |* 1 | VIEW | | 5 | 220 | 3 (0)| 00:00:01 | | 2 | COUNT | | | | | | |* 3 | TABLE ACCESS FULL| TEST_TAB1 | 5 | 70 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 3 - filter("C1" NOT LIKE 'X%')
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 711 bytes sent via SQL*Net to client 492 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
再将TO_DATE转换施加到内层查询的结果之上。 或者,在内层查询上,对C1进行一些不影响结果值的运算。例如: SQL> select * from ( select id, c1||'' c1 from test_tab1 where c1 not like 'X%' ) where to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 4 5 6 7
ID C1 ---------- -------------------------------- 4 2020-11-03 5 2020-12-04
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%' AND TO_DATE("C1"||'','yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 645 bytes sent via SQL*Net to client 492 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed 如上所示,这种处理方法,虽然外层的过滤条件被推入到了内层,但会放到后边执行,这样,当前边的条件已经将不正常的数据过滤掉后,也就不会报错了。 但是,这又引起了我的一个新的疑问,如果初始SQL就是只有一层(如下所示),两个过滤条件在一起时,CBO是先用哪个过滤条件来过滤呢? select * from test_tab1 where c1 not like 'X%' and to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 执行后的结果如下: SQL> set autot on SQL> select * from test_tab1 where c1 not like 'X%' and to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 4 ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C1" NOT LIKE 'X%')
Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 434 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
SQL> select * from test_tab1 where to_date(c1,'yyyy-mm-dd') > date'2020-11-01' and c1 not like 'X%'; 2 3 4 ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C1" NOT LIKE 'X%')
Statistics ---------------------------------------------------------- 1 recursive calls 4 db block gets 4 consistent gets 0 physical reads 0 redo size 434 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed 如上所示,看来和条件出现的顺序是无关的。 这正好给了我们一个启示,在CBO下,在选择先执行哪个过滤条件时,是否会依据统计信息,计算并排序各个过滤条件的选择性,选择性越好的,则越会先被执行呢? 我们测试验证一下。主要测试思路如下: 2、CBO在计算NOT LIKE这类条件时,其计算思路是先计算出LIKE的选择率(类似于相等条件,是条件列中唯一值数量的倒数),然后用1-(like的选择率)就是NOT LIKE的选择率。 3、向表中再插入94行形如‘XXXXXXXXX1’这样的记录。构造一个有100行记录的表,其中c1列上有100个唯一值,然后收集统计信息(注意,不要收集列上的直方图信息,因为在有直方图时,其计算逻辑和方法都要复杂得多,这里,我们只用列上的非直方图的统计信息)。操作过程如下: SQL> insert into test_tab1 select 6+rownum id,lpad(rownum+1,10,'X') c1 from dual connect by rownum<=94;
94 rows created.
SQL> commit; Commit complete.
SQL> exec dbms_stats.gather_table_stats('DEMO','TEST_TAB1',method_opt=>'for columns c1 size 1');
PL/SQL procedure successfully completed.
SQL> select count(*) cnt,count(distinct c1) cnt_c1 from test_tab1;
CNT CNT_C1 ---------- ---------- 100 100 分别来验证一下施加单个条件时,CBO的估算结果 看看是否与前边的理解是吻合的: SQL> set autot on exp SQL> select * from test_tab1 where to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 70 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 5 | 70 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 如上所示,对条件“to_date(c1,‘yyyy-mm-dd’) > date’2020-11-01’”返回行数的估算是5行。由于表中总共有100行,所以,选择率是5/100=5%。与我们的理解是吻合的。 再来看对NOT LIKE的选择率: SQL> set autot traceonly exp SQL> select * from test_tab1 where c1 like 'X%'; 2 3
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" LIKE 'X%')
SQL> select * from test_tab1 where c1 NOT like 'X%'; 2 3
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99 | 1386 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 99 | 1386 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%') 如上所示,我们看到对LIKE和NOT LIKE的估算,与我们的理解也是吻合的。 SQL> set autot traceonly SQL> select * from test_tab1 where c1 not like 'X%' and to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 4 ERROR: ORA-01841: (full) year must be between -4713 and +9999, and not be 0
no rows selected
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 70 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 5 | 70 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "C1" NOT LIKE 'X%')
Statistics ---------------------------------------------------------- 0 recursive calls 4 db block gets 4 consistent gets 0 physical reads 0 redo size 434 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed 那我们再来验证一下,如果可以让条件“c1 NOT like ‘X%’”的选择率低于5%,那么我们就可能让CBO选择先执行该条件了。即1-1/n<0.05,显然,N要小于1.053,由于N表示的是唯一值的数量,所以,一定是个整数,即N只能是1了。 为了满足这个条件,我们将表中C1列的值,全部更新为同一个值:‘XXXXXXXXX1’后,收集统计信息后,如下所示: SQL> set autot off SQL> update test_tab1 set c1='XXXXXXXXX1';
100 rows updated.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats('DEMO','TEST_TAB1',method_opt=>'for columns c1 size 1');
PL/SQL procedure successfully completed. 我们先来验证一下前述两个条件的选择性是否如我们所愿,已经发生了改变: SQL> set autot traceonly exp SQL> select * from test_tab1 where to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 70 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 5 | 70 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter(TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
SQL> select * from test_tab1 where c1 like 'X%'; 2 3
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1400 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 100 | 1400 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" LIKE 'X%')
SQL> select * from test_tab1 where c1 NOT like 'X%'; 2 3
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%') 如上所示,条件“to_date(c1,‘yyyy-mm-dd’) > date’2020-11-01’”的选择率未变,仍然是5%,但条件“c1 NOT like ‘X%’”的选择率已经低于5%,目前估算只有大约1行记录满足该条件。 那么我们再次执行测试SQL,看看结果如何: SQL> select * from test_tab1 where c1 not like 'X%' and to_date(c1,'yyyy-mm-dd') > date'2020-11-01'; 2 3 4
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%' AND TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
SQL> select * from test_tab1 where to_date(c1,'yyyy-mm-dd') > date'2020-11-01' and c1 not like 'X%'; 2 3 4
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%' AND TO_DATE("C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
SQL> 如上所示,这时,CBO已经先执行条件“c1 NOT like ‘X%’”了。 SQL> select * from ( select * from test_tab1 where c1 not like 'X%' ) where to_date(c1,'yyyy-mm-dd') > date'2020-11-01' ; 2 3 4 5 6 7
Execution Plan ---------------------------------------------------------- Plan hash value: 1698440217
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 14 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_TAB1 | 1 | 14 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
1 - filter("C1" NOT LIKE 'X%' AND TO_DATE("TEST_TAB1"."C1",'yyyy-mm-dd')>TO_DATE(' 2020-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 果然是这样。 附录:提供上述模拟数据的生成脚本 SQL> create table test_tab1 (id number,c1 varchar2(32));
Table created.
SQL> insert into test_tab1 select rownum id,to_char(sysdate-dbms_random.value(1,100),'yyyy-mm-dd') c1 from dual connect by rownum<=5;
5 rows created.
SQL> insert into test_tab1 select 5+rownum id,lpad(rownum,10,'X') c1 from dual connect by rownum<=1;
1 row created.
SQL>commit;
SQL> exec dbms_stats.gather_table_stats('DEMO','TEST_TAB1');
PL/SQL procedure successfully completed.
墨天轮原文链接:https://www./db/42008(复制到浏览器中打开或者点击“阅读原文”立即查看) |
|