表中排重的常规做法有:
1 distinct 受限于表的数据量的大小和机器配置
2 rowid 方法.
delete from table where rowid in (select a.rowid from table a,table b where
a.col1=b.col1 and a.col2=b.col2 ...and a.rowid<b.rowid)
这个相对来说是效率比较高的.
最近看书,发现9i及以后的分析函数,效率更高.
使用如下的例子进行测试.
sms_sys_param_test 表里面有260万,找出不重复的只用12s(机器10.40.95.105)
sms_sys_param_test的数据里面不重复的是91条,来源于sms_sys_param.
采用 insert into sms_sys_param_test select * from sms_sys_param_test反复复制.没有建任何所引.
select * from
(select sms_sys_param_test.*,row_number() over(partition by PARAID order by rowid) rn from sms_sys_param_test)
where rn=1;
如果重复的数据很多,可以采用如下方式排重.
create table newtable_name
as select * from
(select sms_sys_param_test.*,row_number() over(partition by PARAID order by rowid) rn from sms_sys_param_test)
where rn=1;
truncate table sms_sys_param_test;
insert into sms_sys_param_test select .... from newtable_name;
drop table newtable_name;
如果很少,可以 把上述的rn=1改为rn>1 .
delete from sms_sys_param_test where rowid in (
select rowid from (select rowid,sms_sys_param_test.*,row_number() over(partition by PARAID order by rowid) rn from sms_sys_param_test)
where rn>1)
删除重复的记录.