IntroductionSAP BI provides an Enterprise Data Warehouse solution for SAP customers to consolidate and analyze their business data. Enterprises can use the SAP BI to read into their business, react to the market changes, and gain competitive advantages. In SAP NetWeaver 2004, SAP BI was called SAP Business Information Warehouse (SAP BW). Building an SAP BI data warehouse is a complex project. It includes various activities such as planning, data modeling, data sizing, ETL job design, and performance tuning. Designing the ETL jobs to load data into and extract data from SAP BI is often the most time consuming task. IBM Information Server is a unified and comprehensive information integration platform. Businesses can use IBM Information Server to connect to various data sources, retrieve and process data contents, and deliver cleansed and high quality information. In the SAP BI projects, IBM Information Server can be leveraged as an efficient ETL tool to process a large volume of data and build the enterprise data warehouse. Appendix A explains the terminology used in this article. Appendix B lists the tools for creating the examples shown in this article. Product prerequisites and installationIBM Information Server includes many software products for data integration and analysis tasks. Those products include WebSphere? DataStage?, WebSphere QualityStage?, WebSphere Information Analyzer, WebSphere Federation Server, and other companion products. Depending on the specific project requirements, you can choose to install a subset of the products in the IBM Information Server. Figure 1. Software products needed for designing ETL jobs for SAP BI and SAP BWFigure 1 shows the minimum set of IBM Information Server products needed to design ETL jobs for the SAP BI or BW data warehouse.
Architecture overviewThe software components in Figure 1 play different roles in designing and executing the ETL jobs for SAP BI.
The DataStage BW Pack includes four major components:
Load data into SAP BIDataStage jobs can be designed to retrieve, cleanse, and consolidate the data from non-SAP sources and load the data into SAP BI systems. For example, you can extract the customer data from your CRM applications and then look up the purchase orders for your customers in your purchase order applications. The consolidated purchase orders can be loaded into your SAP BI system for analysis. This section uses a simple ETL job to illustrate the steps necessary for loading data into an SAP BI system. Figure 2 shows the sample job. The job extracts customer data from an Oracle? database table using an ODBC Stage. It passes the extracted data to the BW Load Stage LoadDataToSAPBI, which loads the processed data into the CUSTOMER Characteristic in an SAP BI system. Table 1 shows the sample data in the Oracle database table. The CUSTOMER Characteristic is created in the SAP BI using the SAP Data Warehousing Workbench, which is shown in Figure 3 and Figure 4. Figure 2. DataStage job for loading data into SAP BITable 1. Sample data in the Oracle database table
Figure 3. CUSTOMER Characteristic in SAP BIFigure 4. CUSTOMER Characteristic in SAP BI — Attribute tabFigure 5. Data flow diagram of BI data load operationFigure 5 illustrates the data flow diagram of a BI data load operation.
The BW Load Stage provides a Stage Editor, as shown in Figure 6. The Stage Editor contains several tabs for setting various parameters of the BI load operation.
Figure 6. BW Load Stage EditorThe BW Load Stage LoadDataToSAPBI in Figure 2 loads the data in Table 1 into the SAP BI. Multiple steps are involved in setting up the BW Load Stage. Those steps are illustrated in Figure 7. They are described in detail in the following sections. Figure 7. Setting up the BW Load Stage LoadDataToSAPBIBW connection setupBI connection properties are set in the General tab, shown in Figure 6. Menu items are provided to create, select, or modify BI connections. Figure 8 shows the new connection dialog window. The new BI connection DEMOCONN is saved on the DataStage server, and can be reused to design other DataStage jobs to perform BI data operations. Figure 8. Connection property dialog windowSource system setupThe source system property is set in the Transfer Structure tab, shown in Figure 9. Menu items are provided to create, select, or view source systems. In SAP BI, a source system can be a flat file, an SAP system, a database system, a multidimensional data source, a Web service, or a staging BAPI interface-based external application. BW Load Stage uses the staging BAPI APIs to exchange metadata with SAP BI and load data into SAP BI. Figure 9. Transfer Structure tabThe Newa€| menu item in Figure 9 creates a new source system. When a new source system is created, BW RFC Manager starts a new BW RFC Server process. The RFC Server process waits for data-load requests from SAP BI and requests the corresponding DataStage job to start the data-load operation. The new source system is saved on the DataStage server for future reuse. Figure 10 shows that a new source system DEMODSSRC is created. The BW RFC Server process for the new source system is automatically created on the DataStage server and registered with the SAP BI as the program <hostname>.DEMODSSRC. As shown in Figure 11, the SAP transaction sm59 can be used to see and test the RFC destination for the new source system. Figure 10. Create a source systemFigure 11. RFC destinationInfoSource setupAn InfoSource is set on the Transfer Structure tab, shown in Figure 12. Menu items are provided on the tab to create, update, view, and search SAP BI characteristics, key figures, and InfoSources. Figure 12. InfoSource menu itemsThe menu item Create Master InfoSource from Existing Characteristica€| creates an InfoSource based on an existing characteristic. Two subsequent user actions are needed when this menu item is selected:
As shown in Figure 15, the BW Load Stage creates the specified InfoSource in the SAP BI. The Stage also selects the InfoSource on the Transfer Structure tab in Figure 16. Figure 15. New InfoSourceFigure 16. Select new InfoSourceDataStage table definition setupWhen an InfoSource is selected on the Transfer Structure tab, a DataStage table definition is created based on the transfer structure of the InfoSource. Figure 17 shows the table definition. Table 2 shows how the SAP data types are mapped to the DataStage data types. The table definition can be validated and synchronized with the InfoSource fields using the Validate Columns and Synchronize Columns buttons. Figure 17. Columns tabTable 2. Data type mapping table
InfoPackage setupThe InfoPackage is set on the InfoPackage tab, shown in Figure 18. The InfoPackage is an entry point for SAP BI to request data from a source system. An InfoPackage defines when and how a DataStage job loads data into an SAP BI system. The InfoPackage tab creates and selects an InfoPackage. As shown in Figure 19, the tab also allows you to set the InfoPackage properties. Figure 18. InfoPackage tabFigure 19. InfoPackage property dialog windowBW Load Stage supports three data load mechanisms:
The InfoPackage third-party parameters are defined to support third-party integration tools like the DataStage BW Pack. As shown in Figure 20, the DataStage job BILoadJob is automatically set as a third-party parameter for the new pull InfoPackage DEMO Pull InfoPacakge. Figure 20. InfoPackage third party parametersThe use of third-party parameters in the BI data loading process is described as follows:
Process Chain SetupThe process chain is set on the Process Chain tab. This step is optional. The BW Load Stage can run with or without a process chain. A process chain defines a sequence of dependent processes linked together. The execution of an InfoPackage is one of the process types that SAP BI defines. As shown in Figure 21, the execution of the InfoPackage DEMO Pull InfoPacakge is added as a process in the process chain Demo Load Chain. Figure 22 selects the process chain Demo Load Chain. Figure 21. Run the data loading job as part of a process chainFigure 22. Select process chainRun data load operationThe process chain Demo Load Chain must be scheduled to run your BI data load operation. The process chain can be started in two ways:
In this example, the DataStage job is first started to invoke the process chain. Figure 25 shows that the DataStage job runs successfully. Figure 26 shows that the source data is loaded into the SAP BI. Figure 25. Run DataStage jobFigure 26. Data loaded into SAP BIExtract data from SAP BIThe BW Extract Stage extracts data from an SAP BI system. The extracted data can be fed into non-SAP applications. The BW Extract Stage is based on the SAP Open Hub Service, which defines a controlled and monitored data exporting process. This section uses a simple ETL job to illustrate the steps necessary to extract data from an SAP BI system. Figure 27 shows the sample job. The job extracts the data from the SAP BI characteristic CUSTOMER, as shown in Figure 4. The job processes the extracted data using a DataStage Transformer Stage and then saves the processed results into a flat file. Figure 27. DataStage job for extracting data from the SAP BIThe BW Extract Stage has a Stage Editor, shown in Figure 28. The Stage Editor contains four tabs for setting up various properties for the BI data extraction operation. Figure 28. BW Extract Stage GUI
The BW Extract Stage ExtractDataFromSAPBI, in Figure 27, extracts data from an SAP BI system. Multiple steps are involved in setting up the BW Extract Stage. Those steps are illustrated in Figure 29 and described in detail in the following sections. Figure 29. Setting up the BW Extract Stage ExtractDataFromSAPBIBI Open Hub Destination and data transfer process setupThe BW Extract Stage supports both BW InfoSpoke and BI Open Hub Destination for data extraction. InfoSpoke is a central piece of the Open Hub Service Pack in SAP BW. An InfoSpoke specifies three properties:
An InfoSpoke must be created first before it can be selected on the Open Hub Destination tab. BW Extract Stage supports creating an InfoSpoke and using the InfoSpoke for data extraction. In the SAP BI, Open Hub Destination has been integrated into the new BI data transfer process and is no longer tightly coupled with InfoSpoke. As illustrated in Figure 30, a data transfer process transforms the data from an InfoProvider to an Open Hub Destination. When the data is ready in the Open Hub Destination, SAP BI notifies the DataStage RFC Server process, which starts a DataStage job to extract the data from the Open Hub Destination. A process chain is created to control the whole data extraction process. Figure 30. BI data extraction diagramThe Extract Stage ExtractDataFromSAPBI, in Figure 27, is set up based on the BI Open Hub Destination. The use of a traditional BW InfoSpoke is not discussed in this article. The following steps summarize how to create the BI artifacts for the stage:
Source system and process chain setupThe source system and process chain are set on the Process Chain tab. A source system is the destination for the extracted data. A process chain controls the data extraction process. In Figure 35, DEMODSSRC is selected as the source system. The process chain CUSTCHAIN is selected for the stage ExtractDataFromSAPBI. Figure 35. Process Chain tabOpen Hub Destination setupAn Open Hub Destination is set on the Open Hub Destination tab, as shown in Figure 37. Two user actions are required:
Run data extraction operationSimilar to the data load operation, the process chain CUSTCHAIN must be scheduled to run the data extraction operation. It can be started either by the DataStage job BIExtractJob or by the SAP Data Warehousing Workbench. In this example, the DataStage job is started to invoke the process chain. Figure 40 shows that the DataStage job runs successfully and Figure 41 shows the data extracted from the SAP BI. Figure 40. Run the DataStage jobFigure 41. Data extracted from SAP BIConclusionThis article demonstrated how to integrate SAP BI data with non-SAP BI data using IBM Information Server and the WebSphere DataStage SAP BW pack. It explained the SAP BI data loading and extraction processes and the new SAP BI feature: the data transfer process. Two examples illustrated the step-by-step design processes. IBM Information Server provides leading technology and integration solutions to two other critical issues in the SAP BI Data Warehouse environment:
Currently work is underway on other SAP BI integration solutions, including direct data accesses and metadata management. The work focuses on developing a fast and efficient ETL solution to build corporate transactional data stores with massive document-level data items. AcknowledgementsI would like to thank Wendi Nusbickel, Nicolas Cominetti, and Susan Shay for their feedback and reviews of this article. Appendix A: Terminology
Appendix B: Tools
|
|