PHP-ETL - Cook Books
Grouping / Aggregation
A second example we can work on is to write a json file where customers are grouped based on their subscription state. We will write this in json as its more suited to understand what we are doing.
Let’s start by reading our csv file
read-file:
operation: csv-read
options: [] # The default delimeter
We will use the simple-grouping
operation for this. This operation needs to put all the data in memory
and should therefore be used with caution.
We have a single grouping-key, we can make more complex grouping operations, by grouping by subscription status and gender for example.
Grouping identifier allows us to remove duplicates, if we had customer emails we could have used that information for example.
group-per-subscription:
operation: simple-grouping
options:
grouping-key: ['IsSubscribed']
group-identifier: []
We will also use json write operation.
This works like the csv file, but is more suited for complex multi level datas as we have after the grouping.
write-new-file:
operation: json-write
options:
file: "output.json"
Complete yaml
chain:
read-file:
operation: csv-read
options: [] # The default delimeter
group-per-subscription:
operation: simple-grouping
options:
grouping-key: ['IsSubscribed']
group-identifier: []
write-new-file:
operation: json-write
options:
file: "output.json"