e-Learning

Learn at your own pace with anytime, anywhere training.

Classroom Schedule

There are no classes currently scheduled

Schedule

There are no classes currently scheduled

Request Private Training

Tell us a little about yourself:

Course Description

This InfoSphere DataStage v11.3 course enables the project administrators and ETL developers to acquire the skills necessary to develop parallel jobs in DataStage. In this course you will develop a deeper understanding of the DataStage architecture, including a deeper understanding of the DataStage development and runtime environments. This will enable you to design parallel jobs that are robust, less subject to errors, reusable, and optimized for better performance. In addition, you will learn advanced techniques for processing data, including techniques for masking data and techniques for validating data using data rules. Finally, in this IBM Training, you will learn techniques for updating data in a star schema data warehouse using the DataStage SCD (Slowly Changing Dimensions) stage.

Objectives

Audience

Prerequisites

  • Able to design and develop a scalable complex solution using an optimum number of stages.
  • Able to select the optimal data partitioning methodology.
  • Able to configure a distributed or non-symmetric environment.
  • Should be proficient with BuildOps and wrappers.
  • Should understand how to tune a parallel application to determine where bottlenecks exist and how to eliminate them.
  • Should understand basic configuration issues for all relational databases and be highly proficient in at least one.
  • Able to improve job design by implementing new product features in DataStage v11.3.
  • Able to monitor DataStage jobs via the Job Log and the Operations Console.

Content

1. Configuration

• Describe how to properly configure DataStage.

• Identify tasks required to create and configure a project to be used for jobs.

• Given a configuration file, identify its components and its overall intended purpose.

• Demonstrate proper use of node pools.

 

2. Metadata

• Demonstrate knowledge of framework schema.

• Identify the method of importing, sharing, and managing metadata.

• Demonstrate knowledge of runtime column propagation (RCP).

 

3. Persistent Storage

• Explain the process of importing/exporting data to/from framework.

• Demonstrate proper use of a Sequential File stage.

• Demonstrate proper usage of Complex Flat File stage.

• Demonstrate proper usage of FileSets and DataSets.

• Demonstrate use of FTP stage for remote data.

• Demonstrate use of restructure stages.

• Identify importing/exporting of XML data.

• Knowledge of balanced optimization for Hadoop and integration of oozie workflows.

• Demonstrate proper use of File Connector stage.

• Demonstrate use of DataStage to handle various types of data including unstructured, hierarchical, Cloud, and Hadoop.

 

4. Parallel Architecture

• Demonstrate proper use of data partitioning and collecting.

• Demonstrate knowledge of parallel execution.

 

5. Databases

• Demonstrate proper selection of database stages and database specific stage properties.

• Identify source database options.

• Demonstrate knowledge of target database options.

• Demonstrate knowledge of the different SQL input/creation options and when to use each.

 

6. Data Transformation

• Demonstrate knowledge of default type conversions, output mappings, and associated warnings.

• Demonstrate proper selections of Transformer stage vs. other stages.

• Describe Transformer stage capabilities.

• Demonstrate the use of Transformer stage variables.

• Identify process to add functionality not provided by existing DataStage stages.

• Demonstrate proper use of SCD stage.

• Demonstrate job design knowledge of using runtime column propagation (RCP).

• Demonstrate knowledge of Transformer stage input and output loop processing.

 

7. Job Components

• Demonstrate knowledge of Join, Lookup and Merge stages.

• Demonstrate knowledge of Sort stage.

• Demonstrate understanding of Aggregator stage.

• Describe proper usage of change capture/change apply.

• Demonstrate knowledge of real-time components.

 

8. Job Design

• Demonstrate knowledge of shared containers.

• Describe how to minimize Sorts and repartitions.

• Demonstrate knowledge of creating restart points and methodologies.

• Explain the process necessary to run multiple copies of the source.

• Knowledge of creating DataStage jobs that can be used as a service.

• Knowledge of balanced optimization.

• Describe the purpose and uses of parameter sets and how they compare with other approaches for parameterizing jobs.

• Demonstrate the ability to create and use Data Rules using the Data Rules stage to measure the quality of data.

• Demonstrate various methods of using DataStage to handle encrypted data.

 

9. Monitor and Troubleshoot

• Demonstrate knowledge of parallel job score.

• Identify and define environment variables that control DataStage with regard to added functionality and reporting.

• Given a process list, identify conductor, section leader, and player process.

• Identify areas that may improve performance.

• Demonstrate knowledge of runtime metadata analysis and performance monitoring.

• Ability to monitor DataStage jobs using the Job Log and Operations Console.

 

10. Job Management and Deployment

• Demonstrate knowledge of DataStage Designer Repository utilities such as advanced find, impact analysis, and job compare.

• Articulate the change control process.

• Knowledge of Source Code Control Integration.

• Demonstrate the ability to define packages, import, and export using the ISTool utility.

• Demonstrate the ability to perform admin tasks with tools such as Directory Admin.

 

11. Job Control and Runtime Management

• Demonstrate knowledge of message handlers.

• Demonstrate the ability to use the dsjob command line utility.

• Demonstrate ability to use job sequencers.

• Create and manage encrypted passwords and credentials files.