分享

Data integration with SAP NetWeaver Business Intelligence using IBM Information Server

 BI之路 2014-05-28
Codename: BlueMix. IBM's go-to-cloud platform. Join the beta.

Introduction

SAP BI provides an Enterprise Data Warehouse solution for SAP customers to consolidate and analyze their business data. Enterprises can use the SAP BI to read into their business, react to the market changes, and gain competitive advantages. In SAP NetWeaver 2004, SAP BI was called SAP Business Information Warehouse (SAP BW).

Building an SAP BI data warehouse is a complex project. It includes various activities such as planning, data modeling, data sizing, ETL job design, and performance tuning. Designing the ETL jobs to load data into and extract data from SAP BI is often the most time consuming task.

IBM Information Server is a unified and comprehensive information integration platform. Businesses can use IBM Information Server to connect to various data sources, retrieve and process data contents, and deliver cleansed and high quality information. In the SAP BI projects, IBM Information Server can be leveraged as an efficient ETL tool to process a large volume of data and build the enterprise data warehouse.

Appendix A explains the terminology used in this article. Appendix B lists the tools for creating the examples shown in this article.

Back to top

Product prerequisites and installation

IBM Information Server includes many software products for data integration and analysis tasks. Those products include WebSphere? DataStage?, WebSphere QualityStage?, WebSphere Information Analyzer, WebSphere Federation Server, and other companion products. Depending on the specific project requirements, you can choose to install a subset of the products in the IBM Information Server.

Figure 1. Software products needed for designing ETL jobs for SAP BI and SAP BW
Software products needed for designing ETL jobs for SAP BI and SAP BW

Figure 1 shows the minimum set of IBM Information Server products needed to design ETL jobs for the SAP BI or BW data warehouse.

  • WebSphere DataStage, which includes:
    • DataStage Client
    • DataStage Server
    • DataStage Metadata Repository
    • DataStage Domain Server
    WebSphere DataStage products can be installed separately on different hosts or installed on the same host.
  • WebSphere DataStage Pack for SAP BW (DataStage BW Pack) The DataStage BW Pack is a companion product of the IBM Information Server. The pack was originally developed to support SAP BW and currently supports both SAP BW and SAP BI. The GUIs of the DataStage BW Pack are installed on the DataStage Client. The runtime part of the Pack is installed on the DataStage Server.
  • SAP Remote Function Call (RFC) Library The SAP RFC library is an external component to the IBM Information Server. The DataStage BW Pack uses the SAP RFC interface to call SAP BI and SAP BW functions. The SAP RFC library is a prerequisite for using the DataStage BW Pack and must be installed on both the DataStage Client and Server.

Back to top

Architecture overview

The software components in Figure 1 play different roles in designing and executing the ETL jobs for SAP BI.

  • The DataStage Client and DataStage BW Pack GUI components provide a friendly user interface to design ETL jobs and to set up the data operations to be performed on SAP BI systems.
  • The DataStage Server and DataStage BW Pack Server components enable users to schedule and run the ETL jobs.
  • The DataStage Domain Server manages user accounts and authorizes users to use different features of the IBM Information Server.
  • The DataStage Metadata Repository is a database for storing and sharing tables, fields, or object definitions.

The DataStage BW Pack includes four major components:

  • BW Load Stage: Loads data from non-SAP data sources to an SAP BI system. BW Load Stage is an SAP-certified, data-loading integration solution implemented using the SAP Staging BAPI interface.
  • BW Extract Stage: Extracts data from an SAP BI system. It is an SAP-certified, data-extraction integration solution based on the SAP Open Hub Service interface.
  • BW RFC Server: Implements various functions that are invoked by an SAP BI system. It accepts the SAP BI initiated data-loading or data-extraction requests and triggers the DataStage jobs to execute the corresponding data operations.
  • BW RFC Manager: Manages the BW RFC Server processes. It creates one BW RFC Server process per source system. It also provides the functions to start or stop BW RFC Server processes. A source system represents a logical or physical system that is external to an SAP BI system. A source system provides source data to an SAP BI system or accepts extracted data from an SAP BI system.

Back to top

Load data into SAP BI

DataStage jobs can be designed to retrieve, cleanse, and consolidate the data from non-SAP sources and load the data into SAP BI systems. For example, you can extract the customer data from your CRM applications and then look up the purchase orders for your customers in your purchase order applications. The consolidated purchase orders can be loaded into your SAP BI system for analysis.

This section uses a simple ETL job to illustrate the steps necessary for loading data into an SAP BI system. Figure 2 shows the sample job. The job extracts customer data from an Oracle? database table using an ODBC Stage. It passes the extracted data to the BW Load Stage LoadDataToSAPBI, which loads the processed data into the CUSTOMER Characteristic in an SAP BI system. Table 1 shows the sample data in the Oracle database table. The CUSTOMER Characteristic is created in the SAP BI using the SAP Data Warehousing Workbench, which is shown in Figure 3 and Figure 4.

Figure 2. DataStage job for loading data into SAP BI
DataStage job for loading data into SAP BI
Table 1. Sample data in the Oracle database table
ID NAME STREET CITY REGION
0000001 IBM New Orchard Road Armonk NY
0000002 Microsoft One Microsoft Way Redmond WA
0000003 SAP America 3999 West Chester Pike Newtown Square PA
POSTALID COUNTRY PHONE FAX TAXNUM
10504 USA 800-426-4968 866-722-9226 000000000
98052 USA 800-642-7676 425-936-7329 000000000
19073 USA 610-661-1000 404-943-2950 000000000
Figure 3. CUSTOMER Characteristic in SAP BI
CUSTOMER Characteristic in SAP BI
Figure 4. CUSTOMER Characteristic in SAP BI — Attribute tab
Figure 4. CUSTOMER Characteristic in SAP BI -- Attribute tab
Figure 5. Data flow diagram of BI data load operation
Data flow diagram of BI data load operation

Figure 5 illustrates the data flow diagram of a BI data load operation.

  1. A source system is defined to represent one or more DataStage jobs loading data into the SAP BI. A data transfer structure describes the data available in the source system. A DataStage job loads the data from external data sources into an SAP BI Persistent Staging Area (PSA) staging table.
  2. Transfer rules are defined to transfer data from the staging table into an InfoSource. An InfoSource is a collection of data fields treated as a single unit. The communication structure defines the data fields of the InfoSource.
  3. Update rules are created to transform the data from the InfoSource to one or more BI data targets (InfoObjects, DataStore objects, or BI InfoCubes).

The BW Load Stage provides a Stage Editor, as shown in Figure 6. The Stage Editor contains several tabs for setting various parameters of the BI load operation.

  • The General tab creates or selects an RFC connection to the SAP BI.
  • The Transfer Structure tab creates or selects a source system and an InfoSource.
  • The Columns tab displays the column definitions of the data being sent to the SAP BI.
  • The InfoPackage tab defines how and when the DataStage job loads data into the SAP BI.
  • The Process Chain tab enables you to run the data load operation as a process within a BI process chain. A process chain provides the workflow function. It is used to design and schedule a series of dependent data-processing processes.
Figure 6. BW Load Stage Editor
BW Load Stage Editor

The BW Load Stage LoadDataToSAPBI in Figure 2 loads the data in Table 1 into the SAP BI. Multiple steps are involved in setting up the BW Load Stage. Those steps are illustrated in Figure 7. They are described in detail in the following sections.

Figure 7. Setting up the BW Load Stage LoadDataToSAPBI
Setting up the BW Load Stage LoadDataToSAPBI

BW connection setup

BI connection properties are set in the General tab, shown in Figure 6. Menu items are provided to create, select, or modify BI connections.

Figure 8 shows the new connection dialog window. The new BI connection DEMOCONN is saved on the DataStage server, and can be reused to design other DataStage jobs to perform BI data operations.

Figure 8. Connection property dialog window
Connection property dialog window

Source system setup

The source system property is set in the Transfer Structure tab, shown in Figure 9. Menu items are provided to create, select, or view source systems.

In SAP BI, a source system can be a flat file, an SAP system, a database system, a multidimensional data source, a Web service, or a staging BAPI interface-based external application. BW Load Stage uses the staging BAPI APIs to exchange metadata with SAP BI and load data into SAP BI.

Figure 9. Transfer Structure tab
Transfer Structure tab

The Newa€| menu item in Figure 9 creates a new source system. When a new source system is created, BW RFC Manager starts a new BW RFC Server process. The RFC Server process waits for data-load requests from SAP BI and requests the corresponding DataStage job to start the data-load operation. The new source system is saved on the DataStage server for future reuse.

Figure 10 shows that a new source system DEMODSSRC is created. The BW RFC Server process for the new source system is automatically created on the DataStage server and registered with the SAP BI as the program <hostname>.DEMODSSRC. As shown in Figure 11, the SAP transaction sm59 can be used to see and test the RFC destination for the new source system.

Figure 10. Create a source system
Create a source system
Figure 11. RFC destination
RFC destination

InfoSource setup

An InfoSource is set on the Transfer Structure tab, shown in Figure 12. Menu items are provided on the tab to create, update, view, and search SAP BI characteristics, key figures, and InfoSources.

Figure 12. InfoSource menu items
InfoSource menu items

The menu item Create Master InfoSource from Existing Characteristica€| creates an InfoSource based on an existing characteristic. Two subsequent user actions are needed when this menu item is selected:

  1. Select an existing Characteristic. BW Load Stage shows the BI characteristics matching the search condition and allows the selection of an existing characteristic. Figure 13 selects the CUSTOMER characteristic.
    Figure 13. Select an existing characteristic
    Select an existing characteristic
  2. Specify the properties of the new BI InfoSource object, shown in Figure 14.
    Figure 14. Specify the properties of new InfoSource
    Specify the properties of new InfoSource

As shown in Figure 15, the BW Load Stage creates the specified InfoSource in the SAP BI. The Stage also selects the InfoSource on the Transfer Structure tab in Figure 16.

Figure 15. New InfoSource
New InfoSource
Figure 16. Select new InfoSource
Select new InfoSource

DataStage table definition setup

When an InfoSource is selected on the Transfer Structure tab, a DataStage table definition is created based on the transfer structure of the InfoSource. Figure 17 shows the table definition. Table 2 shows how the SAP data types are mapped to the DataStage data types. The table definition can be validated and synchronized with the InfoSource fields using the Validate Columns and Synchronize Columns buttons.

Figure 17. Columns tab
Columns tab
Table 2. Data type mapping table
SAP data type DataStage data type
DATS SQL DATE
CURR SQL CHAR
TIMS SQL TIME
FLTP SQL FLOAT
CHAR (no more than 256 characters) SQL CHAR
CHAR (more than 256 characters) SQL VARCHAR

InfoPackage setup

The InfoPackage is set on the InfoPackage tab, shown in Figure 18. The InfoPackage is an entry point for SAP BI to request data from a source system. An InfoPackage defines when and how a DataStage job loads data into an SAP BI system. The InfoPackage tab creates and selects an InfoPackage. As shown in Figure 19, the tab also allows you to set the InfoPackage properties.

Figure 18. InfoPackage tab
InfoPackage tab
Figure 19. InfoPackage property dialog window
InfoPackage property dialog window

BW Load Stage supports three data load mechanisms:

  • Push mode: A DataStage job is started first. The DataStage job schedules the InfoPackage for the job to start the data loading operation.
  • Pull mode: An InfoPackage is scheduled first using SAP Data Warehousing Workbench. When the SAP BI is ready to receive data, it notifies the RFC Server process. The RFC server process launches the DataStage job to send data to the SAP BI.
  • File mode: A DataStage job runs first. The DataStage job saves the data for SAP BI to a temporary file. An InfoPackage is then scheduled to load the data in the file into SAP BI.

The InfoPackage third-party parameters are defined to support third-party integration tools like the DataStage BW Pack. As shown in Figure 20, the DataStage job BILoadJob is automatically set as a third-party parameter for the new pull InfoPackage DEMO Pull InfoPacakge.

Figure 20. InfoPackage third party parameters
InfoPackage third party parameters

The use of third-party parameters in the BI data loading process is described as follows:

  • SAP BI schedules and runs the InfoPackage DEMO PULL InfoPackage.
  • When it is ready to receive data, the InfoPackage sends the loading request to the source system DEMODSSRC. It also passes the third-party parameter DSJob and its value BILoadJob to the source system.
  • The RFC Server process for the source system receives the request and starts the DataStage job BILoadJob to send data packages to the SAP BI.

Process Chain Setup

The process chain is set on the Process Chain tab. This step is optional. The BW Load Stage can run with or without a process chain.

A process chain defines a sequence of dependent processes linked together. The execution of an InfoPackage is one of the process types that SAP BI defines. As shown in Figure 21, the execution of the InfoPackage DEMO Pull InfoPacakge is added as a process in the process chain Demo Load Chain. Figure 22 selects the process chain Demo Load Chain.

Figure 21. Run the data loading job as part of a process chain
Run the data loading job as part of a process chain
Figure 22. Select process chain
Select process chain

Run data load operation

The process chain Demo Load Chain must be scheduled to run your BI data load operation. The process chain can be started in two ways:

  • Use the Data Warehousing Workbench. Figure 23 shows the sequence of activities.
    Figure 23. Start the process chain using the SAP Data Warehousing Workbench
    Start the process chain using the SAP Data Warehousing Workbench
  • Use the DataStage job BILoadJob. Figure 24 shows the sequence of activities.
    Figure 24. Start the process chain using the DataStage job
    Start the process chain using the DataStage job

In this example, the DataStage job is first started to invoke the process chain. Figure 25 shows that the DataStage job runs successfully. Figure 26 shows that the source data is loaded into the SAP BI.

Figure 25. Run DataStage job
Run DataStage job
Figure 26. Data loaded into SAP BI
Data loaded into SAP BI

Back to top

Extract data from SAP BI

The BW Extract Stage extracts data from an SAP BI system. The extracted data can be fed into non-SAP applications. The BW Extract Stage is based on the SAP Open Hub Service, which defines a controlled and monitored data exporting process.

This section uses a simple ETL job to illustrate the steps necessary to extract data from an SAP BI system. Figure 27 shows the sample job. The job extracts the data from the SAP BI characteristic CUSTOMER, as shown in Figure 4. The job processes the extracted data using a DataStage Transformer Stage and then saves the processed results into a flat file.

Figure 27. DataStage job for extracting data from the SAP BI
DataStage job for extracting data from the SAP BI

The BW Extract Stage has a Stage Editor, shown in Figure 28. The Stage Editor contains four tabs for setting up various properties for the BI data extraction operation.

Figure 28. BW Extract Stage GUI
BW Extract Stage GUI
  • The General tab creates or selects an RFC connection to an SAP BI.
  • The Process Chain tab selects a source system and a process chain. The BI data extraction operation is run as a process within a process chain.
  • The Open Hub Destination tab selects an InfoSpoke or an Open Hub Destination.
  • The Columns tab displays the column definitions for the extracted data from the SAP BI.

The BW Extract Stage ExtractDataFromSAPBI, in Figure 27, extracts data from an SAP BI system. Multiple steps are involved in setting up the BW Extract Stage. Those steps are illustrated in Figure 29 and described in detail in the following sections.

Figure 29. Setting up the BW Extract Stage ExtractDataFromSAPBI
Setting up the BW Extract Stage ExtractDataFromSAPBI

BI Open Hub Destination and data transfer process setup

The BW Extract Stage supports both BW InfoSpoke and BI Open Hub Destination for data extraction.

InfoSpoke is a central piece of the Open Hub Service Pack in SAP BW. An InfoSpoke specifies three properties:

  • An InfoProvider that provides the original data. An InfoProvider can be an InfoCube, a DataStore object, or an InfoObject.
  • An Open Hub Destination that defines the targets to receive the extracted data. An Open Hub Destination can be a flat file, a database table, or a RFC destination.
  • Transformation that converts the data from its original form to the destination form.

An InfoSpoke must be created first before it can be selected on the Open Hub Destination tab. BW Extract Stage supports creating an InfoSpoke and using the InfoSpoke for data extraction.

In the SAP BI, Open Hub Destination has been integrated into the new BI data transfer process and is no longer tightly coupled with InfoSpoke. As illustrated in Figure 30, a data transfer process transforms the data from an InfoProvider to an Open Hub Destination. When the data is ready in the Open Hub Destination, SAP BI notifies the DataStage RFC Server process, which starts a DataStage job to extract the data from the Open Hub Destination. A process chain is created to control the whole data extraction process.

Figure 30. BI data extraction diagram
BI data extraction diagram

The Extract Stage ExtractDataFromSAPBI, in Figure 27, is set up based on the BI Open Hub Destination. The use of a traditional BW InfoSpoke is not discussed in this article. The following steps summarize how to create the BI artifacts for the stage:

  1. Create and activate a new Open Hub Destination using the SAP transaction RSBO or the Data Warehousing Workbench GUI. Figure 31 shows the dialog window for creating a new Open Hub Destination DEMODEST. The attributes of the CUSTOMER characteristic are selected to create the field definitions of the new Open Hub Destination. As shown in Figure 32, the DEMODSSRC RFC destination is selected as the data receiver for the Open Hub Destination.
    Figure 31. Create new Open Hub Destination
    Create new Open Hub Destination
    Figure 32. Select RFC destination
    Select RFC destination
  2. Create and activate a new data transfer process to transform the data from an InfoProvider to the Open Hub Destination. Figure 33 shows the dialog window for creating a new data transfer process. The data transfer process transfers the data from the Customer attributes to the Open Hub Destination DEMODEST.
    Figure 33. Create a new data transfer process
    Create a new data transfer process
  3. Create and activate a new process chain to invoke the new data transfer process. Figure 34 shows that the data transfer process CUSTOMER > DEMODEST is added as a process to the process chain CUSTCHAIN.
    Figure 34. Create and activate a process chain
    Create and activate a process chain

Source system and process chain setup

The source system and process chain are set on the Process Chain tab. A source system is the destination for the extracted data. A process chain controls the data extraction process.

In Figure 35, DEMODSSRC is selected as the source system. The process chain CUSTCHAIN is selected for the stage ExtractDataFromSAPBI.

Figure 35. Process Chain tab
Process Chain tab

Open Hub Destination setup

An Open Hub Destination is set on the Open Hub Destination tab, as shown in Figure 37. Two user actions are required:

  1. Select an Open Hub Destination.

    In Figure 36, the Open Hub Destination DEMODEST is selected for the stage ExtractDataFromSAPBI. The stage retrieves the DEMODEST definitions from BI and automatically populates various GUI controls shown in Figure 37.

    The table fields shown in Figure 37 are a part of the DEMODEST definitions. Those fields are converted to a DataStage tab definition that is displayed on the Column tab. The Column tab for the BW Extract Stage is the same as the Column tab for the BW Load Stage shown in Figure 17.

    Figure 36. Select an Open Hub Destination
    Select an Open Hub Destination
    Figure 37. Open Hub Destination tab
    Open Hub Destination tab
  2. Update third-party parameters for the Open Hub Destination.

    An Open Hub Destination supports third-party parameters in the same way as InfoPackage. The button Update BW, in Figure 37, sets the third-party parameters of the selected Open Hub Destination. Figure 38 shows the result of when the button is clicked. Figure 39 shows that the job name BIExtractJob, the process chain CUSTCHAIN and the source system DEMODSSRC are set as the third-party parameters of the Open Hub Destination DEMODEST. The usage of the third-party parameters in the BW Extract Stage is similar to the usage of the parameters in the BW Load Stage.

    Figure 38. Update the third-party parameters
    Update the third-party parameters
    Figure 39. DEMODEST third-party parameters
    DEMODEST third-party parameters

Run data extraction operation

Similar to the data load operation, the process chain CUSTCHAIN must be scheduled to run the data extraction operation. It can be started either by the DataStage job BIExtractJob or by the SAP Data Warehousing Workbench.

In this example, the DataStage job is started to invoke the process chain. Figure 40 shows that the DataStage job runs successfully and Figure 41 shows the data extracted from the SAP BI.

Figure 40. Run the DataStage job
Run the DataStage job
Figure 41. Data extracted from SAP BI
Data extracted from SAP BI

Back to top

Conclusion

This article demonstrated how to integrate SAP BI data with non-SAP BI data using IBM Information Server and the WebSphere DataStage SAP BW pack. It explained the SAP BI data loading and extraction processes and the new SAP BI feature: the data transfer process. Two examples illustrated the step-by-step design processes.

IBM Information Server provides leading technology and integration solutions to two other critical issues in the SAP BI Data Warehouse environment:

  • Data Quality: The data that builds a data warehouse often comes from various data sources. The structure of the legacy data is often not documented and the data quality is poor. The WebSphere Information Analyzer product analyzes your data and determines the data structure and quality. It helps you understand your data. The WebSphere QualityStage product standardizes and matches any type of information to create high quality data.
  • Data Volume: There is often a huge amount of data that needs to be processed regularly for a data warehouse environment. Sometime the data volume grows beyond expectations. The issue needs to be addressed with a scalable ETL architecture. IBM Information Server leverages the pipeline and partition technologies to support high data throughput. IBM Information Server can be deployed on symmetric multiprocessing (SMP) and massively parallel processing (MPP) computer systems to achieve the maximum scalability.

Currently work is underway on other SAP BI integration solutions, including direct data accesses and metadata management. The work focuses on developing a fast and efficient ETL solution to build corporate transactional data stores with massive document-level data items.

Acknowledgements

I would like to thank Wendi Nusbickel, Nicolas Cominetti, and Susan Shay for their feedback and reviews of this article.

Back to top

Appendix A: Terminology

Terminology Description
ETL Extract, Transform, and Load
SAP BI Business Intelligence
SAP BW SAP Business Information Warehouse
GUI Graphical User Interface
CRM Customer Relationship Management
ODBC Open Database Connectivity
DataStage job A sequence of data operations performed by IBM Information Server.
RFC SAP term, Remote Function Call
PSA SAP BI term, Persistent Staging Area
Staging BAPI SAP BI term, an open interface for third party ETL tools
Open Hub Service SAP BI term, an SAP BW/BI data exporting mechanism
Source System SAP BI term, a logical or physical system external to an SAP BI system.
InfoObject SAP BI term, a lowest level information provider
DataStore Object SAP BI term, a storage location for consolidated transaction and master data at document level.
InfoCube SAP BI term, several relational tables arranged in a star schema
InfoSource SAP BI term, a quantity of information that logically belongs together
InfoPackage SAP BI term, an entry point for requesting data from a source system
InfoSpoke SAP BI term, an extraction object that exports data within the Open Hub Service
Process Chain SAP BI term, a sequence of processes linked together
Transfer Structure SAP BI term, a selection of data fields from a source system

Back to top

Appendix B: Tools

Figure Tool
Figure 2, 25, 27, 40 DataStage Designer
Figure 3, 4, 11, 15, 20, 21, 26, 31, 32, 33, 34, 39
SAP Data Warehousing Workbench
Figure 6, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 22 BW Load Stage Editor
Figure 28, 35, 36, 37, 38 BW Extract Stage Editor
Figure 41 DataStage File Stage Editor

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多