分享

Using Named Parameters in Kettle | Adventures with Open Source BI

 集微笔记 2013-09-12

Kettle’s named parameters often enable very elegant solutions for ETL requirements. This post gives an introduction to the named parameters feature of Kettle.

It is safe to think of named parameters as variables with their initial values assigned before the ETL process starts. In fact, they are available as variables when the ETL process executes. Their values can be specified in the launch dialog inside spoon or on the command line when running kitchen or pan.

How to define named parameters?

Named parameters are defined in the job or transformation settings dialog. Use the parameters tab to specify them. Give each parameter a name and a default value in case the parameter is not explicitly given another value during process start.

How to specify the parameter values?

When launching a job or transformation from the Spoon application the launch dialog allows to enter values for each named parameter.

When launching from the command line, you’ll have to specify them using command line parameters. The exact syntax differs slightly depending on the operating system and shell you are using, but the general idea is to pass in the parameters using the -param:name=value argument for each named parameter.

Example: launch job.kjb with named parameters files.dir=/opt/files and max.date=2010-06-02

Linux: ./kitchen.sh -file:job.kjb -param:files.dir=/opt/files -param:max.date=2010-06-02
Windows: Kitchen.bat -file:job.kjb “-param:files.dir=/opt/files” “-param:max.date=2010-06-02″

Please observe the usual escaping rules of your shell when passing in parameter values. The whole -param:name=value block must appear as a single argument to the shell.

How to pass parameters to sub jobs and transformations?

Often a master job is used for control flow. It coordinates invocations of sub jobs and transformations. Parameters already present in the master job are passed on to the sub jobs and transformations if you check their “pass all parameter values down” checkbox. You can specify more parameters manually by entering the paramerer names and values in the parameters box.

Remember that the sub job or transformation must know about the parameters you are trying to pass in. They must be defined in the settings dialog or they will not be available in the sub job or transformation, no matter what you care to pass in from outside. By the way: using this feature together with the “execute for every input row” feature enables an interesting looping technique.

How to access parameter values?

Parameters are available as variables. Each named parameter is mapped to a variable during runtime. If you have a parameter named foo you can reference it using ${foo} in any place where variables are supported.

How to access parameters in scripts?

Sometimes it is convenient to derive new variables from incoming parameters. An incoming parameter representing a date can be used to calculate variables for the paths and names of files you want to import for example. Scripting access to variables works slightly differently depending on whether the context is a job or a transformation.

Accessing variables in jobs

In Kettle jobs the JavaScript job entry can be used to read and set variables. Use the parent_job.setVariable() and parent_job.getVariable() methods to retrieve and set the values of variables. Please note that Kettle variables always hold string values. If you need to do any arithmetic you should convert the string value to an appropriate type first.

Have a look at the example job which accepts an incoming date in yyyy-MM-dd format on the parameter run.date and calculates the previous week’s date from that, saving it in the last.week variable.

// prepare
var format = new java.text.SimpleDateFormat("yyyy-MM-dd");
var cal = java.util.Calendar.getInstance();

// get variable and subtract seven days
var runDateString = parent_job.getVariable("run.date");
cal.setTime(format.parse(runDateString));
cal.add(java.util.Calendar.DAY_OF_MONTH, -7);

// set new kettle variable
parent_job.setVariable("last.week", format.format(cal.getTime()));

true

In case you are wondering how Java code ends up in the JavaScript job entry, check out this post.

Accessing variables in transformations

If you are working with the JavaScript step in a transformation you can use the predefined functions getVariable() and setVariable() to access variables. View samples of how they are used by right clicking them in the Transform Functions/Special Functions section on the navigator on the left.

In the context of transformations special caution is required: when setting variables in transformations it is important to understand that the multi-threaded nature of transformations makes it impossible to tell exactly when the variable is being set. Therefore it is illegal to set and read the same variable within the same transformation. If you intend to create or manipulate a variable within a transformation, make sure you use its value further down in the control flow of an enclosing job.

Have a look at the example illustrating the technique. A master job runs a transformation which generates random numbers between 1 and 1000. The transformation saves the number closest to 500 in the variable best.match. When the transformation completes the job logs which number was closest.

The following script is used to check for the best match and set the best.match variable.

var bestMatch;

// normalize between 1 and 1000
var randValue = abs(rand) % 1000 + 1;

// check for best match
if (bestMatch == null){
	bestMatch = randValue;
	setVariable("best.match", ""+bestMatch, "s");
}
else if (abs(bestMatch-500) > abs(randValue-500)){
	bestMatch = randValue;
	setVariable("best.match", ""+bestMatch, "s");
}

Conclusion

Named parameters are defined and given default values in the settings dialog of a job or transformation. They offer an elegant way for passing information in and out of jobs and transformations. Named parameters are conveniently available as regular variables so they can be used in any context that supports variable substitution. Named parameters can also be accessed and manipulated using Kettle’s scripting facilities, making them a versatile tool for many ETL needs.

Comments and corrections are welcome :-)

Cheers

Slawo

Related Literature

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多