Kettle’s named parameters often enable very elegant solutions for ETL requirements. This post gives an introduction to the named parameters feature of Kettle. It is safe to think of named parameters as variables with their initial values assigned before the ETL process starts. In fact, they are available as variables when the ETL process executes. Their values can be specified in the launch dialog inside spoon or on the command line when running kitchen or pan. How to define named parameters?Named parameters are defined in the job or transformation settings dialog. Use the parameters tab to specify them. Give each parameter a name and a default value in case the parameter is not explicitly given another value during process start. How to specify the parameter values?When launching a job or transformation from the Spoon application the launch dialog allows to enter values for each named parameter. When launching from the command line, you’ll have to specify them using command line parameters. The exact syntax differs slightly depending on the operating system and shell you are using, but the general idea is to pass in the parameters using the -param:name=value argument for each named parameter. Example: launch job.kjb with named parameters files.dir=/opt/files and max.date=2010-06-02
Please observe the usual escaping rules of your shell when passing in parameter values. The whole -param:name=value block must appear as a single argument to the shell. How to pass parameters to sub jobs and transformations?Often a master job is used for control flow. It coordinates invocations of sub jobs and transformations. Parameters already present in the master job are passed on to the sub jobs and transformations if you check their “pass all parameter values down” checkbox. You can specify more parameters manually by entering the paramerer names and values in the parameters box. Remember that the sub job or transformation must know about the parameters you are trying to pass in. They must be defined in the settings dialog or they will not be available in the sub job or transformation, no matter what you care to pass in from outside. By the way: using this feature together with the “execute for every input row” feature enables an interesting looping technique. How to access parameter values?Parameters are available as variables. Each named parameter is mapped to a variable during runtime. If you have a parameter named foo you can reference it using ${foo} in any place where variables are supported. How to access parameters in scripts?Sometimes it is convenient to derive new variables from incoming parameters. An incoming parameter representing a date can be used to calculate variables for the paths and names of files you want to import for example. Scripting access to variables works slightly differently depending on whether the context is a job or a transformation. Accessing variables in jobsIn Kettle jobs the JavaScript job entry can be used to read and set variables. Use the parent_job.setVariable() and parent_job.getVariable() methods to retrieve and set the values of variables. Please note that Kettle variables always hold string values. If you need to do any arithmetic you should convert the string value to an appropriate type first. Have a look at the example job which accepts an incoming date in yyyy-MM-dd format on the parameter run.date and calculates the previous week’s date from that, saving it in the last.week variable. // prepare var format = new java.text.SimpleDateFormat("yyyy-MM-dd"); var cal = java.util.Calendar.getInstance(); // get variable and subtract seven days var runDateString = parent_job.getVariable("run.date"); cal.setTime(format.parse(runDateString)); cal.add(java.util.Calendar.DAY_OF_MONTH, -7); // set new kettle variable parent_job.setVariable("last.week", format.format(cal.getTime())); true In case you are wondering how Java code ends up in the JavaScript job entry, check out this post. Accessing variables in transformations
In the context of transformations special caution is required: when setting variables in transformations it is important to understand that the multi-threaded nature of transformations makes it impossible to tell exactly when the variable is being set. Therefore it is illegal to set and read the same variable within the same transformation. If you intend to create or manipulate a variable within a transformation, make sure you use its value further down in the control flow of an enclosing job. Have a look at the example illustrating the technique. A master job runs a transformation which generates random numbers between 1 and 1000. The transformation saves the number closest to 500 in the variable best.match. When the transformation completes the job logs which number was closest. The following script is used to check for the best match and set the best.match variable. var bestMatch; // normalize between 1 and 1000 var randValue = abs(rand) % 1000 + 1; // check for best match if (bestMatch == null){ bestMatch = randValue; setVariable("best.match", ""+bestMatch, "s"); } else if (abs(bestMatch-500) > abs(randValue-500)){ bestMatch = randValue; setVariable("best.match", ""+bestMatch, "s"); } ConclusionNamed parameters are defined and given default values in the settings dialog of a job or transformation. They offer an elegant way for passing information in and out of jobs and transformations. Named parameters are conveniently available as regular variables so they can be used in any context that supports variable substitution. Named parameters can also be accessed and manipulated using Kettle’s scripting facilities, making them a versatile tool for many ETL needs. Comments and corrections are welcome Cheers Slawo Related Literature |
|