PHP-ETL - Operations
Aggregation/Grouping - Simple grouping (simple-grouping)
The simple-grouping
operation groups items based on a common key, useful for data aggregation (e.g., grouping customers by city). It collects items in memory, then outputs a single GroupedItem
with an iterator for the grouped data.
Options
- grouping_key: An array of keys to use for grouping. The values of these keys will be combined to create a unique identifier for each group.
- group_identifier: (Optional) An array of keys to use for identifying individual items within a group. If specified, only the last item with a given identifier will be kept in the group.
Example
Here’s an example of how to use the simple-grouping
operation to group a list of customers by their city.
Input Data (a sequence of items):
[
{ "name": "John Doe", "city": "New York" },
{ "name": "Jane Doe", "city": "New York" },
{ "name": "Peter Jones", "city": "London" }
]
YAML Configuration:
chain:
- operation: simple-grouping
options:
grouping_key: ["city"]
- operation: rule-transformer
options:
# Rules to process the grouped data.
# The input to this operation will be an iterator of groups.
# Each group will be an array of customers.
Output:
The rule-transformer
will receive an iterator with two groups:
- Group 1 (New York):
[ { "name": "John Doe", "city": "New York" }, { "name": "Jane Doe", "city": "New York" } ]
- Group 2 (London):
[ { "name": "Peter Jones", "city": "London" } ]