最新项目稍有空隙,开始研究SQL Server 2012和2014的一些BI特性,参照(Matt)的一个示例,我们开始体验SSIS中的CDC(Change Data Capture,变更数据捕获)。 注:如果需要了解关于SQL Server 2008中的CDC,请看这里http://www.cnblogs.com/downmoon/archive/2012/04/10/2439462.html),本文假定读者对CDC的工作方式已有所了解。^_^。 我们分三步完成实例: 1、准备基础数据; 2、设计一个初始包; 3、在2的基础上设计一个增量包。 首先请完成以下准备安装: (1)Visual studio 2012或Visual Studio 2012 Shell (Isolated) Redistributable Package http://www.microsoft.com/en-us/download/details.aspx?id=30678 http://www.microsoft.com/en-us/download/details.aspx?id=30670 (2)SQL Server Data Tools - Business Intelligence for Visual Studio 2012 http://www.microsoft.com/zh-cn/download/details.aspx?id=36843 (2)SQL Server 2012企业版或开发版 http://www.microsoft.com/en-us/download/details.aspx?id=29066 (3)示例数据库AdventureWorksDW2012(本文必须,如果自建表则不必) http://msftdbprodsamples./releases/view/55330
好了,开始第一步: /*-- =============================================-- 创建测试数据库及数据表,借助AdventureWorksDW2012示例数据库---Generate By downmoon(邀月),3w@live.cn-- =============================================*/--Create database CDCTest--GO--USE [CDCTest]--GO--SELECT * INTO DimCustomer_CDC--FROM [AdventureWorksDW2012].[dbo].[DimCustomer]--WHERE CustomerKey < 11500;--select * from DimCustomer_CDC; /*-- =============================================-- 启用数据库级别CDC,只对企业版和开发版有效---Generate By downmoon(邀月),3w@live.cn-- =============================================*/USE [CDCTest]GOEXEC sys.sp_cdc_enable_dbGO-- add a primary key to the DimCustomer_CDC table so we can enable support for net changesIF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[DimCustomer_CDC]') AND name = N'PK_DimCustomer_CDC') ALTER TABLE [dbo].[DimCustomer_CDC] ADD CONSTRAINT [PK_DimCustomer_CDC] PRIMARY KEY CLUSTERED ( [CustomerKey] ASC)GO/*-- =============================================-- 启用表级别CDC---Generate By downmoon(邀月),3w@live.cn-- =============================================*/EXEC sys.sp_cdc_enable_table @source_schema = N'dbo',@source_name = N'DimCustomer_CDC',@role_name = N'cdc_admin',@supports_net_changes = 1GO /*-- =============================================-- 创建一个目标表,与源表(Source)有相同的表结构--注意,在生产环境中,完全可以是不同的实例或服务器,本例为了方便,在同一个数据库实例的同一个数据库中演示---Generate By downmoon(邀月),3w@live.cn-- =============================================*/SELECT TOP 0 * INTO DimCustomer_DestinationFROM DimCustomer_CDC--select @@version;select * from DimCustomer_Destination; 第二步:创建初始包 -- =============================================-- 我们使用两个包来完成示例,一个初始包完成数据的初始加载,一个增量包完成数据的变更捕获---Generate By downmoon(邀月),3w@live.cn-- ============================================= 初始包包含如下逻辑: 新建一个SSIS项目,创建一个包“Initial Load”,如下图: 新建两个CDC Control Task,分别命名为“CDC Control Task Start”和“CDC Control Task End”,分别对应属性为“Mark initial load start”和""Mark initial load end" 连接管理器均为ADO.NET方式,其他属性如下图: 中间加入一个“Data Flow Task”,属性默认。 此时,运行包,可见CDC_States有初始标记。
第三步:创建增量包 增量包包含如下逻辑: 在项目中创建一个新包,命名为“Incremental Load” 在包的"Control Flow"视图中,自上而下分别手动6个Task,顺序如下图,除去上面用到的三个Task,其余均为Execute SQL Task 注意:CDC Control Task End的CDC运算符为MARK Process Range,CDC Control Task Start的CDC运算符为Get Process Range 其余4个Execute SQL Task的SQL语句如下: --Create stage TablesIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_UPDATES]') AND type in (N'U'))BEGIN SELECT TOP 0 * INTO stg_DimCustomer_UPDATES FROM DimCustomer_DestinationENDIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_DELETES]') AND type in (N'U'))BEGIN SELECT TOP 0 * INTO stg_DimCustomer_DELETES FROM DimCustomer_DestinationEND -- batch updateUPDATE destSET dest.FirstName = stg.FirstName, dest.MiddleName = stg.MiddleName, dest.LastName = stg.LastName, dest.YearlyIncome = stg.YearlyIncomeFROM [DimCustomer_Destination] dest, [stg_DimCustomer_UPDATES] stgWHERE stg.[CustomerKey] = dest.[CustomerKey] -- batch deleteDELETE FROM [DimCustomer_Destination] WHERE[CustomerKey] IN ( SELECT [CustomerKey] FROM [dbo].[stg_DimCustomer_DELETES]) -- truncate table truncate table [dbo].[stg_DimCustomer_DELETES]truncate table [dbo].[stg_DimCustomer_UPDATES]
最关键的一步,选中CDC Control Task Start,并切换到Data Flow,自上而下分别拖动CDC Source,CDC Splitter Transformer,三个ADO.NET Destination,如下图: 其中三个的目标表分别为:[DimCustomer_Destination],stg_DimCustomer_DELETES,stg_DimCustomer_UPDATES。 而CDC Source的连接管理器属性如下图: 此时,可运行增量包,但我们不会看到任何运行结果,因为此时我们还没有进行数据的Insert或Update操作。 下来我们提供一个脚本,测试下效果: -- =============================================-- 更新一些数据,以显示SSIS 2012中CDC的效果---Generate By downmoon(邀月),3w@live.cn-- =============================================USE [CDCTest]GO -- Transfer the remaining customer rowsSET IDENTITY_INSERT DimCustomer_CDC ON INSERT INTO DimCustomer_CDC( CustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, NumberChildrenAtHome, EnglishEducation, SpanishEducation, FrenchEducation, EnglishOccupation, SpanishOccupation, FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, AddressLine2, Phone, DateFirstPurchase, CommuteDistance)SELECT CustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, NumberChildrenAtHome, EnglishEducation, SpanishEducation, FrenchEducation, EnglishOccupation, SpanishOccupation, FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, AddressLine2, Phone, DateFirstPurchase, CommuteDistanceFROM [AdventureWorksDW2012].[dbo].[DimCustomer]WHERE CustomerKey =11502 SET IDENTITY_INSERT DimCustomer_CDC OFFGO -- give 10 people a raiseUPDATE DimCustomer_CDC SET YearlyIncome = YearlyIncome + 10WHERE CustomerKey >= 11000 AND CustomerKey <= 11010 GO 此时,我们可以看到变更捕获的结果: 如果您觉得还不够直观,请"Enable Data Viewer", 至此,一个SSIS 2012中CDC的实例演示结束,如果还有进一步的研究,请移驾MSDN,下面有链接。本文也提供示例项目包,以作研究之用。 项目文件下载1,项目文件下载2
本文参考: http://msdn.microsoft.com/en-us/library/bb895315.aspx http://www./index.php/2011/12/cdc-in-ssis-for-sql-server-2012-2/?utm_source=rss&utm_medium=rss&utm_campaign=cdc-in-ssis-for-sql-server-2012-2
|
|