Current page last modified at: 11-Jan-2018 11:11:37

ATTX Architecture Overview

The following components are part of the ATTX architecture.

Figure 1. ATTX Platform Architecture

Figure 1. ATTX Semantic Broker Architecture

ATTX Semantic Broker is composed of a collection of loosely coupled services, which implement several broker-like capabilities. Example of such services are illustrated in: Microservices Architecture.

Given that there are a collection of such services, we can still identify three core components under which we can cluster the services: Workflow Component, Graph Component, Distribution Component while the Service Discovery and the Semantic Broker Deployment gather all of them together in order to provide them as a individual services or as a platform. The communication and contracts between the components is specified in Message Broker.

Terminology and Acronyms

Acronyms:

  • WF - Workflow Component
  • GC - Graph Component
  • DC - Distribution Component
  • PD - Platform Deployment/Semantic Broker Deployment
  • PD - Service Discovery
  • UV - UnifiedViews (WF component implementation)
  • ES - ElasticSearch (DC component implementation)
  • ETL Artifact - any kind of ETL software (e.g. UnifiedViews)
  • sometimes a workflow can also be referred to as pipeline
  • sometimes a activity can also be referred to as executed job or pipeline execution
  • artifact - any software or application/programming language specific library

Workflow Component

The ATTX Workflow Component provides a configurable framework with the main purpose of managing, scheduling and monitoring data ingestion, processing and distribution related workflows. Workflow Component provides the Semantic Broker information related to provenance of the working data.

Graph Component

The ATTX Graph component associated to this project has the main goal of aggregating the data that flows within the Semantic Broker, types of transformations (and associated workflows), the provenance information (agent and ETL processes performed) and other meta data.

Distribution Component

This ATTX Distribution Component provides the interface between the Workflow Component and/or Graph Component for public consumption of disseminated data.

Semantic Broker Deployment

This ATTX Semantic Broker Deployment describes the necessary steps for setting up the whole ATTX Project or working with individual components.

Service Discovery

Service Discovery is part of the ATTX Semantic Broker core components as it addresses the need to deal with with static and manual configuration of components and at the same time the need to scale up the (micro)services offered by the broker. More details at:

Message Broker

There are two versions to the service discovery/inter-component communication illustrated below, both of them can adhere to a message based communication (mostly asynchronous) or REST-API based communication (mostly synchronous) or a combination of both.

Figure 2. Version 1 of Inter-component Communication

Figure 2. Version 1 of Inter-component Communication

  • Version 1 - represents the base structure for achieving the flow of data between the three core components. At this stage the distinction between the components is rudimentary. For example the Graph Manager consists of a single scheduled event that consumes the UVProvovenance API. Components communication with direct HTTP requests using hard-coded component names.

Figure 3. Version 3 of Inter-component Communication

Figure 3. Version 2 with Service discovery and processing services

  • Version 2 - in this stage the components take shape such as Pipeline2 functionality is provided by the Graph Manager Component, and the Graph Manager also provides an API.

We chose UnifiedViews as a default interface for the Workflow Component however other ETL tools can be considered as long as they provide the necessary framework for extracting provenance(e.g. https://nifi.apache.org/ or https://github.com/spotify/luigi or http://www.dswarm.org/ etc. - see Workflow Management Tools ).

While the communication between components is implemented using messaging middleware component such as ActiveMQ or RabbitMQ - see Message Broker - the communication between components does not have to be real time.

Microservices Example

We would like to illustrate how such an architecture would look in practice and some of the services that might be part of it.

In Figure 4 we illustrate a pipeline that downloads a new version of a dataset and replaces the old one. We identified the following services to aid with this process:

  • RMLService - converting the downloaded dataset from CSV format to RDF and storing it for access at a specified location;
  • GM-API - used for orchestrating access to the Graph Store and generation of provenance information;
  • ProvenanceAPI - based on the collected information and a trigger from GM-API this service will generate provenance data and store it to its own Graph Store or send it to GM-API on a request basis;
  • UVProvenanceAPI - used for extracting provenance information from UnifiedViews, as not all provenance information can be collected (start and end times of a pipeline/workflow, if a workflow is public or private, or if a pipeline execution is successful) by the DPUs and transmitted to GM-API and implicitly to ProvenanceAPI.

Figure 4. ATTX Platform Architecture

Figure 4. Pipeline for downloading and replacing dataset

As a continuation of the result of the pipeline depicted in Figure 4, the pipeline from Figure 5 exhibits the steps necessary for publishing the resulting data. In order to achieve this we identified the following services:

  • ESDistributionService - service for constructing, based on the graph retrieved from Graph Store, JSON/JSON-LD formatted bulk data, that is stored on a volume. This microservice can be further divided into two other services:
    • RDFFramerService - converting the data from RDF to JSON/JSON-LD;
    • indexService - (bulk) indexing at a specific endpoint;
  • ElasticSearch - several versions might be available each providing different indexing mechanisms.

Figure 5. ATTX Platform Architecture

Figure 5. Pipeline for Publishing Dataset

In all the cases presented above the Service Discovery component plays an important role for retrieving and acts as a router for requests - more information available at: Service Discovery Implementation.

results matching ""

    No results matching ""