PHP-ETL - Getting Started
🐘 Standalone
Introduction
You probably will never run the ETL in a stand-alone mode, but if you do intend to create an adapter for a different Framework or CMS you need to understand how to start the ETL from “nothing”. So this document focused on that aspect.
That said that does not mean the ETL can’t be run standalone, it’s just that the “init” part is not ideal.
Writing some code
- Start by installing the necessary dependencies
composer require oliverde8/php-etl
- We will also need a factory to create execution context’s, you can read more about what an execution context is
here.
$executionContextFaxtory = new ExecutionContextFactory();
TIP
The default provider we use here is very simple and basically bypasses the context’s.
- You will need to initialize all your operations.
// We need to create the rule applier operaiton with all the rules we have. Additional rules can be added. $ruleApplier = new \Oliverde8\Component\RuleEngine\RuleApplier( new \Psr\Log\NullLogger(), [ new \Oliverde8\Component\RuleEngine\Rules\Get(new \Psr\Log\NullLogger()), new \Oliverde8\Component\RuleEngine\Rules\Implode(new \Psr\Log\NullLogger()), new \Oliverde8\Component\RuleEngine\Rules\StrToLower(new \Psr\Log\NullLogger()), new \Oliverde8\Component\RuleEngine\Rules\StrToUpper(new \Psr\Log\NullLogger()), new \Oliverde8\Component\RuleEngine\Rules\ExpressionLanguage(new \Psr\Log\NullLogger()), ] ); $builder = new ChainBuilder(getExecutionContextFactory()); $builder->registerFactory(new RuleTransformFactory('rule-engine-transformer', RuleTransformOperation::class, $ruleApplier)); $builder->registerFactory(new FilterDataFactory('filter', FilterDataOperation::class, $ruleApplier)); $builder->registerFactory(new SimpleGroupingFactory('simple-grouping', SimpleGroupingOperation::class)); $builder->registerFactory(new ChainSplitFactory('split', ChainSplitOperation::class, $builder)); $builder->registerFactory(new CsvFileWriterFactory('csv-write', FileWriterOperation::class)); $builder->registerFactory(new JsonFileWriterFactory('json-write', FileWriterOperation::class)); $builder->registerFactory(new CsvExtractFactory('csv-read', CsvExtractOperation::class)); $builder->registerFactory(new JsonExtractFactory('json-read', JsonExtractOperation::class)); $builder->registerFactory(new SplitItemFactory('split-item', SplitItemOperation::class)); $builder->registerFactory(new SimpleHttpOperationFactory('http', SimpleHttpOperation::class)); $builder->registerFactory(new ExternalFileFinderFactory('external-file-finder-local', ExternalFileFinderOperation::class, new LocalFileSystem("/"))); $builder->registerFactory(new ExternalFileProcessorFactory("external-file-processor", ExternalFileProcessorOperation::class));
TIP
We are here initializing all possible operations, most of the documentation will assume you are using symfony as framework, you will need to register your factories manually if you are not.
- We can start describing our etl in a
Yaml
file. We will create here a single step chain that just dumps the data.chain: dump-data: operation: dump options: []
- We can now build our etl
$chainProcessor = $builder->buildChainProcessor(Yaml::parse(file_get_contents($fileName)),[]);
- Before starting the etl we need our input data
$inputData = [['myKey' => "value1"], ['myKey' => "value1"]]
- We can now start it
$chainProcessor->process(new ArrayIterator($inputData));
Conclusion
We have created a very simple ETL that only outputs to the console the data we have given it. To move on further you should read: