传统的SQL优化思想认为IN、Exists这2种写法不同数据量的环境中各擅胜场,Developer应当根据实际情况合理运用IN或Exists。
实际我认为,这种认识对于现代SQL优化器Optimizer(10g以后)不再适用。 CBO优化器中本身包含了查询转换(Query Transformation)的功能。
为什么CBO要做查询转换(Query Transformation)?
回答: 用户提交给SQL引擎的语句可能不是执行查询最有效的语句
查询转换(Query Transformation)做了什么?
回答: 查询转换按照固定的语法将语句重写为等意的SQL,但是优化了其SQL结构
查询转换(Query Transformation)包括启发式地查询转换和基于成本的查询转换。
对于IN、Exists , CBO optimizer 的查询装换效果如何?:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for 64-bit Windows: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> select count(1) from joinA A where A.CUST_ID in ( select B.cust_id from joinb B where B.CUST_FIRST_NAME like 'C%'); COUNT(1) ---------- 39381 SQL> alter system flush shared_pool; System altered. SQL> alter system flush shared_pool; System altered. SQL> alter session set events '10053 trace name context forever,level 1'; Session altered. SQL> select count(1) from joinA A where A.CUST_ID in ( select B.cust_id from joinb B where B.CUST_FIRST_NAME like 'C%'); COUNT(1) ---------- 39381 SQL> oradebug setmypid Statement processed. SQL> oradebug tracefile_name D:\APP\ML\diag\rdbms\testem\testem\trace\testem_ora_6776.trc Final query after transformations:******* UNPARSED QUERY IS ******* SELECT COUNT(*) "COUNT(1)" FROM (SELECT "B"."CUST_ID" "CUST_ID" FROM "SYS"."JOINB" "B" WHERE "B"."CUST_FIRST_NAME" LIKE 'C%') "VW_NSO_1", "SYS"."JOINA" "A" WHERE "A"."CUST_ID" = "VW_NSO_1"."CUST_ID" kkoqbc: optimizing query block SEL$683B0107 (#2) SQL> alter session set events '10053 trace name context forever,level 1'; Session altered. SQL> select count(1) from joinA A where exists ( select 1 from joinb B where B.CUST_FIRST_NAME like 'C%' and B.cust_id=A.cust_id ); COUNT(1) ---------- 39381 Final query after transformations:******* UNPARSED QUERY IS ******* SELECT COUNT(*) "COUNT(1)" FROM (SELECT "B"."CUST_ID" "ITEM_1" FROM "SYS"."JOINB" "B" WHERE "B"."CUST_FIRST_NAME" LIKE 'C%') "VW_SQ_1", "SYS"."JOINA" "A" WHERE "VW_SQ_1"."ITEM_1" = "A"."CUST_ID"
如以上10053 trace 所显示 对于2个分别使用了IN和Exists的查询, CBO Optimizer查询转换后得到的结果十分类似:
IN:
SELECT COUNT(*) “COUNT(1)”
FROM (SELECT “B”.”CUST_ID” “CUST_ID”
FROM “SYS”.”JOINB” “B”
WHERE “B”.”CUST_FIRST_NAME” LIKE ‘C%’) “VW_NSO_1”,
“SYS”.”JOINA” “A”
WHERE “A”.”CUST_ID” = “VW_NSO_1″.”CUST_ID”
Exists:
SELECT COUNT(*) “COUNT(1)”
FROM (SELECT “B”.”CUST_ID” “ITEM_1”
FROM “SYS”.”JOINB” “B”
WHERE “B”.”CUST_FIRST_NAME” LIKE ‘C%’) “VW_SQ_1”,
“SYS”.”JOINA” “A”
WHERE “VW_SQ_1”.”ITEM_1″ = “A”.”CUST_ID”
CBO在完成Query transformations之后才会对转换后的语句进行APA(access path analysis)和后续的优化树构建、成本比较等等,一般来说如果你使用10g以后的CBO ,可以认为语义等价的IN或Exists在性能上也不会有太大的区别。
请教ML,如果sql语句写法在exist 和表关联的情况比较下,是否总是选择表关联很好呢?因为之前遇到过同一张大表被exist 2次,效率极低导致cpu 100% like :
select a.col_attr
from slowsoul.big_table a, slowsoul.big_table t, slowsoul.big_table b
where a.col_dishi = 11
and t.col_dishi = a.col_dishi
and a.col1 = t.col_attr
and t.col1 = 1199300017720 — :col1
and t.col_attid = ‘VNetProdcol1’
and a.col_attid = ‘smpGroupid’ — :col_attid
and a.col_attr is not null
AND b.col_dishi = a.col_dishi
and b.col1 = a.col1
and b.col_attid = ‘ShortNumFlag’
and b.col_attr = ‘1’;
=========================2种写法
select col_attr
from slowsoul.big_table a
where a.col_dishi = 12
and EXISTS (select 1
from slowsoul.big_table t
where t.col_dishi = a.col_dishi
and t.col1 = ‘12101517017’
and col_attid = ‘VNetProdcol1’
and t.col_attr = a.col1)
and a.col_attid = ‘ShortNumFlag’
and a.col_attr is not null
AND EXISTS (select 1
from slowsoul.big_table b
where b.col_dishi = a.col_dishi
and b.col1 = a.col1
and b.col_attid = ‘ShortNumFlag’
and b.col_attr = ‘1’);
我试了一下,在10.2.0.4中,trace文件中没有找到”Final query after transformations”
在11.2.0.2 中有。的确如楼主所说。
说错了,10g也有的。在******* UNPARSED QUERY IS *******位置