ETL Artifacts

The Workflow Component ETL artifact consists of three core components as depicted in Figure 1:

  • Ingestion
  • Processing
  • Distribution

Other ETL components could be added to the ETL stack with the purpose of enhancing the results. Examples include:

  • NER for extracting Places, Organization etc.
  • Data converters

Some of the uses of the ETL workflow include: inferencing, calling and updating distribution endpoints, processing and transformation, updating ElasticSearch data and also calling functionalities from Graph Component, more precisely Graph Manager API which in return will make use of the services available.

A simplified version of the communication and data flow in the Semantic Broker is provided in Figure 1. For examples on specific services please refer to ATTX Architecture Overview.

Figure 1. Semantic Broker Component and Data Flow

Comparison of ETL Artifacts

The table below shows a basic comparison of some of the ETL tools and required functionalities with relation to the ATTX project.

Tool Workflows Activities REST API Plugins UI License
Wings Yes Yes ? No ? No ? Yes Apache 2.0
LinkedPipes Yes Yes Yes Yes ? Yes MIT
DSwarm ? Maybe Transformations ? Maybe Transformations Yes No ? Yes Apache 2.0
Web-Karma No, although there is Batch Mode No, although there is Batch Mode Yes Yes, kinda Yes Apache 2.0
UnifiedViews Yes Yes Yes (Limited) Yes Yes GPL 3.0
FluidOps ? ? ? Yes ? Yes Commercial
Silk Framework Yes, Tasks Yes, Workspace Yes Yes Yes Apache 2.0
Pentaho Maybe ? via Jobs Maybe ? via Jobs No ? Yes Yes Apache 2.0


