EdgeServer

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

The Fusion Edge Server is a Java web application, it’s responsibility is to host SDMX web services for the dissemination of data and related structural and reference metadata. The web services support all Data Formats and Structure Formats supported by Fusion Registry as such Fusion Edge Server can be used as a direct replacement for the dissemination services provided by Fusion Registry.

The purpose of the Fusion Edge Server is to support public or internal dissemination of data and/or metadata either via clients who use the web service API directly, or via software/web user interfaces which convert the information into a graphical display for data discovery, retrieval, and display. The driver for using Fusion Edge Server as a dissemination solution over Fusion Registry is that Fusion Edge Server offers a read only solution, and with no database it can be horizontally scaled with no single point of failure. Fusion Edge Server can also work independently of Fusion Registry, there is no requirement to run a Fusion Registry instance, giving options for users who have a requirement to host public SDMX datasets, or who want to offer dynamic access into their data from tools such as Excel, Tableau, or other statistical software.

Example Applications which make use of these web services are:

  1. Fusion Data Browser to provide a web user interface for the discovery, display and export of data.
  2. FXLData which provides data connectivity to Microsoft Excel
  3. Tableau Web Data Connector providing data connectivity to Tableau
  4. SDMX Connectors third party library which provides connectivity to multiple applications including R, Matlab, Excel
  5. Fusion Data Portal, which can automate data ingestion into a local database from remote web services

Fusion Edge Application Suite

The Fusion Edge Server suite consists of two applications:

  1. Fusion Edge Compiler. This is used to compile SDMX structure and data files into a format which can be read by the Fusion Edge Server. The output of the Fusion Edge Compiler is an Environment.
  2. Fusion Edge Server. This is web application responsible for reading the Environment into memory, and exposing the information via its SDMX web services

Design

The Fusion Edge Server reads into memory a pre-compiled Environment. An Environment is built from a collection of SDMX structure and data files, and it is this information which is made available via the public web services of Fusion Edge Server. The Fusion Edge Compiler is used to build the Environment, it does so by simply reading files on a file system and converting these into read only stores of information which can be rapidly ingested by Fusion Edge Server. The input of the Fusion Edge Compiler is SDMX files, the output of a Fusion Edge Compiler are a collection of files which comprise the Environment.

Publishing new content to the Fusion Edge Server is simply a case of moving the Environment to a location that can be read by Fusion Edge Server. Fusion Edge Server can be configured to automatically poll for updates (dynamic mode), or it can be configured to only read an Environment on application startup (static mode).

Features

Web Services and Formats

The primary purpose of the Fusion Edge Server is to host a read only web service for data and related metadata. These web services are consistent with those offered by Fusion Registry, and include:

In addition all the Data Formats and Structure Formats of Fusion Registry are supported by the Fusion Edge Server.

Performance and Durability

Edge Server Performance

The Fusion Edge Server is built to be highly performant, even under extreme load from multiple concurrent users. It uses a purpose built in-memory time series database, augmented with a custom caching layer, both of which have been tuned to work with the SDMX data and metadata model. The caching layer, for instance, knows when two data queries refer to the same data sub-cube even if the queries are expressed differently in the URL. Query optimisation similarly ensures that multiple concurrent queries for the same data are executed only once, directing the subsequent requests to the cached data once it has been built.

The Fusion Edge Server loads the pre-compiled content into memory on application startup, an Edge Server hosting 18 datasets, with 700k series and 50 million observations will startup and be ready to serve queries in as little as 20 seconds.

Fusion Edge Server can run in Dynamic Mode which means an Environment can be served from central shared location such as Amazon S3 or a private web server, allowing multiple Fusion Edge Server instances to update their content from the same location.

Content can even be set to be made available at a specific date and time (embargo), coupled with the ability to pre-load embargo data, it is possible to release a new Environment across all Fusion Edge Servers spanning multiple regions at exactly the same time (to the nearest millisecond).

This architecture, has no single point of failure making the Fusion Edge Server the perfect solution for horizontal scaling.

Immutable and Secure

Compiled Environments are digitally signed, if they are modified in any way, the modification will be detected by Fusion Edge Server and as such the Environment will be rejected. When Fusion Edge Server loads an Environment into memory it can not be modified, as the in memory objects are immutable. Fusion Edge Server only provides read only web services. There are no external interfaces to support modification of any kind to either the configuration of the application or information it holds. Fusion Edge Server is not exposed to SQL injection as it does not use SQL internally, all data stores are purpose built to efficiently process SDMX queries, and are also built so that the content is immutable.

This makes Fusion Edge Server the perfect solution for public data dissemination where security is a priority.

Interchangeable and Loosely Coupled

Fusion Edge Server is built off the same core code as Fusion Registry, as such it offers the same data and structure formats. When Fusion Registry is upgraded to include new formats, or to expand the web service functionality, these upgrades by default go into the next release of Fusion Edge Server. However the Fusion Edge Server has been designed such that it does not need a Fusion Registry to run. All the Fusion Edge Server requires is that it is fed with a valid SDMX structure files and valid SDMX Data files (if data dissemination is a requirement).

This design means Fusion Edge Server can be used to disseminate information from any system which is able to export SDMX files, including but not limited to Fusion Registry.

Scheduled Release of Data (embargo)

Data can be set a specific time and day for release. The Fusion Edge Server also be set to pre-prepare Environments for release prior to the embargo time. For example if an Environment is scheduled for release at 12.00pm, and the Environment is moved to a secure Amazon S3 file system at 11.50, the Fusion Edge Server can be configured to pull these files into its local file system 5 minutes before go live, to eliminate any risk of network lag pulling the files. The Fusion Ede Server can also be set to pre-load the Environment into memory 30 seconds before go live, the Environment is now ready for dissemination but disconnected from any process which can get to the data, it is still fully secure. The Fusion Edge Server is able to then make the Environment go live by simply swapping the old Environment with the new, making it go live with millisecond precision.

Global collection of servers managed from a single location

Fusion Edge Server is given a location to read an Environment for dissemination. This location may be on the local file system, or it could be a URL to a collection of files hosted on a web service. In addition it could be a folder hosted on Amazon S3. When the Environment is placed in a location which can be accessed by more then one Fusion Edge Server, it is possible to update all Fusion Edge Servers by simply updating one central Environment. This makes it possible to host and manage content in Fusion Edge Servers hosted all over the world from a single location. Combining this feature with Embargo makes it possible to update all Fusion Edge servers at exactly the same time from a single file system.

Audited Events

Each Fusion Edge Server can be configured to persist a log of events. The log is in JSON format and broken down in such a way to make them easily processable to determine metrics such as where the queries are coming from, which dataset are popular, which output formats are popular, which browsers (or other agents) are popular.

Automated Data Publication Pipelines

The Fusion Edge Server can be deployed such that data can be moved from internal systems into the edge via scripts which provide a fully automated solution.