PHP-ETL - Cook Books
Grouping / Aggregation

A second example we can work on is to write a json file where customers are grouped based on their subscription state. We will write this in json as its more suited to understand what we are doing.

Let’s start by reading our csv file

  read-file:
    operation: csv-read
    options: [] # The default delimeter

We will use the simple-grouping operation for this. This operation needs to put all the data in memory and should therefore be used with caution.

We have a single grouping-key, we can make more complex grouping operations, by grouping by subscription status and gender for example.

Grouping identifier allows us to remove duplicates, if we had customer emails we could have used that information for example.

group-per-subscription:
  operation: simple-grouping
  options:
    grouping-key: ['IsSubscribed']
    group-identifier: []

We will also use json write operation.

This works like the csv file, but is more suited for complex multi level datas as we have after the grouping.

write-new-file:
  operation: json-write
  options:
    file: "output.json"

Complete yaml

chain:
  read-file:
    operation: csv-read
    options: [] # The default delimeter

  group-per-subscription:
    operation: simple-grouping
    options:
      grouping-key: ['IsSubscribed']
      group-identifier: []

  write-new-file:
    operation: json-write
    options:
      file: "output.json"