I hope you enjoyed Part One of our Matillion Orchestration blog series.
In the next part of this series, I am going to look at creating a template that can be reused and depending on which variables are passed in; truncate a table. This technique helps support the ability to choose between performing a full or incremental load.
The whole point of creating a template is to allow Matillion to perform a similar task repeatedly without the need to create a separate task for each execution. This helps cutdown on development time and reduces maintenance going forward since changes can be enacted in fewer places.
In the case of this template there are three things I want to achieve:
- Run a Transformation Task
- Choose whether the target table needs to be truncated
- Log the outcome to Amazon CloudWatch
Having reviewed these three requirements the functionality will be the same other than minor differences such as the name of the job that needs to be ran and whether the target table needs to be truncated. So, the first step is to determine what information needs to be passed in.
You will see in the above image that as well as passing in the task name we also pass in database, schema and table names that allow us control which table gets truncated. You will notice that there is no reference to truncate in the job variable settings, this is because we control that at an environment level and as such is managed as an environment variable. The reason for this is that this setting can affect several different components outside of the scope of this template and as such is managed as an environmental variable.
You will also notice that many of the job variables have a default value set; this to enable Matillion to compile the template.
Creating the Template
The next step is to create a generic template as below:
The idea here being that there should be no hard coded values set (unless you know that the value will always be the same). For example, if you know that the given template would always be used within the same database or schema, but normally if anything were hard coded it would defeat the object of creating the template in the first place
Other things that you will notice in the example are the use of python components, which are a powerful addition to any template. In this instance one of the functions, we are performing is to query a snowflake database table to determine if a full load is required and to set the local variables accordingly.
It is also good practice to give definitions of each variable so that when people other than the author come to use the template it is clear what they need to pass in.
Using job variables within a template is quite simple you just need to refer to it by name in the properties window as below (note that you can use both scalar and grid variables).
Using your new template
Making use of your new template is quite simple, all you need to do is:
- Drop a Run Orchestration task on the canvass
- Select your template from the dropdown
- Fill out your variables
- Run the Job!
As you can see once the template is set up it can be used repeatedly for any new transformation task you create. Here at Crimson Macaw we use templates to orchestrate all of our Matillion projects meaning that once a new transformation task has been created we can add to the pipeline using a single component which takes seconds to do!
Want to know more? Get in touch with us here.