ETL Artifacts
The Workflow Component ETL artifact consists of three core components as depicted in Figure 1:
- Ingestion
- Processing
- Distribution
Other ETL components could be added to the ETL stack with the purpose of enhancing the results. Examples include:
- NER for extracting Places, Organization etc.
- Data converters
Some of the uses of the ETL workflow include: inferencing, calling and updating distribution endpoints, processing and transformation, updating ElasticSearch data and also calling functionalities from Graph Component, more precisely Graph Manager API which in return will make use of the services available.
A simplified version of the communication and data flow in the Semantic Broker is provided in Figure 1. For examples on specific services please refer to ATTX Architecture Overview.
Figure 1. Semantic Broker Component and Data Flow
Comparison of ETL Artifacts
The table below shows a basic comparison of some of the ETL tools and required functionalities with relation to the ATTX project.
Tool | Workflows | Activities | REST API | Plugins | UI | License |
---|---|---|---|---|---|---|
Wings | Yes | Yes ? | No ? | No ? | Yes | Apache 2.0 |
LinkedPipes | Yes | Yes | Yes | Yes ? | Yes | MIT |
DSwarm | ? Maybe Transformations | ? Maybe Transformations | Yes | No ? | Yes | Apache 2.0 |
Web-Karma | No, although there is Batch Mode | No, although there is Batch Mode | Yes | Yes, kinda | Yes | Apache 2.0 |
UnifiedViews | Yes | Yes | Yes (Limited) | Yes | Yes | GPL 3.0 |
FluidOps | ? | ? | ? | Yes ? | Yes | Commercial |
Silk Framework | Yes, Tasks | Yes, Workspace | Yes | Yes | Yes | Apache 2.0 |
Pentaho | Maybe ? via Jobs | Maybe ? via Jobs | No ? | Yes | Yes | Apache 2.0 |