PHP-ETL - Cook Books
Using Sub Chains

Using subchains

There will be cases where the chain description can become quite repetitive, let’s take the following example from Chapter 1 - Splittin/Forking.

In that example we have split our customer.csv files into 2 files, one with the customers subscribed to the newsletter and one with those not subscribed. We do not do any additional process to change the structure of the data.

Let’s now imagine we would like to extract only the firstName and Lastname from the csv file for the subscribed customers. The resulting chain would look like:

  branch-out:
    operation: split
    options:
      branches:
        -
          filter-unsubscribed:
            operation: filter
            options:
              rule: [{get : {field: 'IsSubscribed'}}]
              negate: false
          transform:
            operation: rule-engine-transformer
            options:
              add: false # We want to replace all existing columns with our new columns.
              columns:
                FirstName:
                  rules:
                    - get : {field: 'FirstName'}
                LastName:
                  rules:
                    - get : {field: "LastName"}
          write-new-file:
            operation: csv-write
            options:
              file: "subscribed.csv"
        -
          filter-subscribed:
            operation: filter
            options:
              rule: [{get : {field: 'IsSubscribed'}}]
              negate: true

          write-new-file:
            operation: csv-write
            options:
              file: "unsubscribed.csv"

In order to do the same for both subscribed & unsubscribed customer we would need to duplicate the whole transform operation. That would be quite inefficient. Also this is a very simple case, if we wanted to add grouping and more transforms it makes the amount of duplications even more important.

The subChain can be used in such cases:

We can create such a subchain that will make the necessary transformations.

subChains:
  customTransform:
    chain:
      -
        operation: rule-engine-transformer
        options:
          add: false # We want to replace all existing columns with our new columns.
          columns:
            FirstName:
              rules:
                - get : {field: 'FirstName'}
            LastName:
              rules:
                - get : {field: "LastName"}

We can use this operation anywhere within our chain

          transform:
            operation: subchain
            options:
              name: customTransform

The following rules applies for subchains:

  • Sub chains can have multiple operations as with a normal chain.
  • Operation for subchains are cloned, so a grouping operation will not share memory. Unless option; shared is true.
  • subchains can use subchains, so it’s possible to have multiple levels of subchains.

Complete Code

subChains:
  customTransform:
    chain:
      generic-subchain-transformation:
        operation: rule-engine-transformer
        options:
          add: false # We want to replace all existing columns with our new columns.
          columns:
            FirstName:
              rules:
                - get : {field: 'FirstName'}
            LastName:
              rules:
                - get : {field: "LastName"}

chain:
  read-file:
    operation: csv-read
    options: [] # The default delimeter,&
  branch-out:
    operation: split
    options:
      branches:
        -
          filter-unsubscribed:
            operation: filter
            options:
              rule: [{get : {field: 'IsSubscribed'}}]
              negate: false
          transform:
            operation: subchain
            options:
              name: customTransform
          write-new-file:
            operation: csv-write
            options:
              file: "subscribed.csv"
        -
          filter-subscribed:
            operation: filter
            options:
              rule: [{get : {field: 'IsSubscribed'}}]
              negate: true
          transform:
            operation: subchain
            options:
              name: customTransform
          write-new-file:
            operation: csv-write
            options:
              file: "unsubscribed.csv"