Difference between revisions of "EdgeServer"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Deployment Architecture)
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Fusion Edge =
+
[[Category:Fusion Edge Server]]
 +
[[Category:How_to_FES]]
 +
== Overview ==
 +
The Fusion Edge Server is a Java web application, it’s responsibility is to host SDMX web services for the dissemination of data and related structural and reference metadata.  The web services support all [https://wiki.sdmxcloud.org/Category:SdmxDataFormat Data Formats] and Structure Formats supported by Fusion Registry as such Fusion Edge Server can be used as a direct replacement for the dissemination services provided by Fusion Registry.
  
== Overview ==
+
The purpose of the Fusion Edge Server is to support public or internal dissemination of data and/or metadata either via clients who use the web service API directly, or via software/web user interfaces which convert the information into a graphical display for data discovery, retrieval, and display.  The driver for using Fusion Edge Server as a dissemination solution over Fusion Registry is that Fusion Edge Server offers a read only solution, and with no database it can be horizontally scaled with no single point of failure.  Fusion Edge Server can also work independently of Fusion Registry, there is no requirement to run a Fusion Registry instance, giving options for users who have a requirement to host public SDMX datasets, or who want to offer dynamic access into their data from tools such as Excel, Tableau, or other statistical software.
The Fusion Edge Server is a Java web application, it’s responsibility is to host SDMX web services for the dissemination of data and related structural and reference metadata.  The web services provided by the Fusion Edge Server can be used by multiple applications, including and not limited to:
 
  
   
+
Example Applications which make use of these web services are:
 
# [https://demo.metadatatechnology.com/FusionDataBrowser/ Fusion Data Browser] to provide a web user interface for the discovery, display and export of data.   
 
# [https://demo.metadatatechnology.com/FusionDataBrowser/ Fusion Data Browser] to provide a web user interface for the discovery, display and export of data.   
 
# [[FXLData]] which provides data connectivity to Microsoft Excel
 
# [[FXLData]] which provides data connectivity to Microsoft Excel
 
# [[Tableau_Connector|Tableau Web Data Connector]] providing data connectivity to Tableau
 
# [[Tableau_Connector|Tableau Web Data Connector]] providing data connectivity to Tableau
 
# [https://github.com/amattioc/SDMX SDMX Connectors] third party library which provides connectivity to multiple applications including R, Matlab, Excel
 
# [https://github.com/amattioc/SDMX SDMX Connectors] third party library which provides connectivity to multiple applications including R, Matlab, Excel
# Custom public data dissemination web sites and ‘data portals’ as run by most national statistics offices, central banks and international organisations
+
# [[Pull_data_from_third_party_REST_API|Fusion Data Portal]], which can automate data ingestion into a local database from remote web services
# Data / metadata API services for data consumers who need direct access
 
  
In addition, consumers can access the web services directly in order to set up automated data pipelines pulling data on demand.
+
== Fusion Edge Application Suite ==
+
The Fusion Edge Server suite consists of two applications:
== Performance and Durability ==
 
The Fusion Edge Server is built to be highly performant, even under extreme load from multiple concurrent users.  It uses a purpose built in-memory time series database, augmented with a custom caching layer, both of which have been tuned to work with the SDMX data and metadata model.  The caching layer, for instance, knows when two data queries refer to the same data sub-cube even if the queries are expressed differently in the URL.  Query optimisation similarly ensures that multiple concurrent queries for the same data are executed only once, directing the subsequent requests to the cached data once it has been built.
 
  
The Fusion Edge Server loads the pre-compiled content into memory on application startup, an Edge Server hosting 18 datasets, with 700k series and 50milliion observations will startup and be ready to serve queries in as little as 20 secondsThe content can be served from central shared location, such as Amazon S3 or a private web server, allowing multiple Fusion Edge Server instances to obtain their content from the same location. When the content is updated in the central location, each Fusion Edge Server can automatically update its local contentThe content can even be set to be made available at a specific date and time, meaning each Fusion Edge Server can make the content live at exactly the same time.
+
# '''[[Fusion_Edge_Compiler|Fusion Edge Compiler]].''' This is used to compile SDMX structure and data files into a format which can be read by the Fusion Edge Server. The output of the Fusion Edge Compiler is an [[Edge_Server_Environment|Environment]].
 +
# '''Fusion Edge Server'''This is web application responsible for reading the Environment into memory, and exposing the information via its SDMX web services
  
This architecture, has no single point of failure making the Fusion Edge Server the perfect solution for horizontal scaling.
+
== Design ==
 +
The Fusion Edge Server reads into memory a pre-compiled Environment. An [[Edge_Server_Environment|Environment]] is built from a collection of SDMX structure and data files, and it is this information which is made available via the public web services of Fusion Edge Server.  The Fusion Edge Compiler is used to build the [[Edge_Server_Environment|Environment]], it does so by simply reading files on a file system and converting these into read only stores of information which can be rapidly ingested by Fusion Edge Server.  The input of the Fusion Edge Compiler is SDMX files, the output of a Fusion Edge Compiler are a collection of files which comprise the [[Edge_Server_Environment|Environment]].  
  
== Security ==
+
Publishing new content to the Fusion Edge Server is simply a case of moving the Environment to a location that can be read by Fusion Edge Server.  Fusion Edge Server can be configured to automatically poll for updates (dynamic mode), or it can be configured to only read an Environment on application startup (static mode).
It is important to note that the content in the Fusion Edge Server is all public, as long as the client has been given access to the web service, they can query for any structural metadata, data, or reference metadata content without restriction.  The Fusion Edge Server does not provide any user security, or authentication servicesIt is possible however to host multiple Fusion Edge Server environments which host different content.  For example a public dissemination environment and a private internal dissemination environment.  This can be managed both by the Fusion Edge Compiler configuration, which can define what content is exported into each environment.  Content can also be controlled in the Fusion Registry by creating multiple user accounts.  In one export configuration the Fusion Edge Compiler can act as a public user with no authentication, and in another configuration the Fusion Edge Compiler can provide a username and password to authenticate itself, and the Fusion Registry will filter content accordingly.
 
 
The Fusion Edge Server has no User Interface for maintaining content, and therefore no web services for editing content.  All content loaded into the Fusion Edge Server is immutable, it can not be modified once loaded, only replaced.  It is not open to attacks such as SQL injection as it does not make use of SQL, all data stores are custom built specifically to serve immutable content.  The only way to get content into the Fusion Edge Server is via a central Ledger which can be hosted on the local file system, or via a secure web server or Amazon S3.  The central Ledger contains tamper protection by signing all content with a secret key, known only to the Fusion Edge Server and the Fusion Edge Compiler which compiles the content for publication.  It is not possible to tamper with the content of the ledger without the secret key.  A new entry manually added to the ledger will be rejected by the Fusion Edge Server, as will a compiled file which has been manually edited.
 
  
== Deployment Architecture ==
+
== Features ==
[[images/2/28/FusionEdgeServer_Deployment_Architecture.png|Example Architecture]]
+
=== Web Services and Formats ===
 +
The primary purpose of the Fusion Edge Server is to host a read only web service for data and related metadata. These web services are consistent with those offered by Fusion Registry, and include:
  
The Fusion Edge Server environment consists of:
+
* [[Data_Query_Web_Service|'''Data Query''']] - retrieve datasets
 +
* [[Data_Availability_Web_Service|'''Data Availability''']] - used by applications to determine what data exists
 +
* [[Publication_Table_Web_Services|'''Publication Tables''']] - obtain pre built tables of data
 +
* [[Search_Data_API|'''Free Text Search''']] - Free text search for datasets
 +
* [[Reference_Metadata_API|'''Reference Metadata''']] - Discover reference metadata reported against structural metadata
 +
* [https://github.com/sdmx-twg/sdmx-rest/blob/master/doc/structures.md '''Structural Metadata'''] - Query for structural metadata
  
# The web application (Fusion Edge Server). This is responsible for hosting the data and making it available via web services.
+
In addition all the [https://wiki.sdmxcloud.org/Category:SdmxDataFormat Data Formats] and Structure Formats of Fusion Registry are supported by the Fusion Edge Server.
# The compiler (Fusion Edge Compiler).  This is used to compile the content to be published to the Fusion Edge Server.  It manages a central ledger, and provides updates to the ledger and related indexes as part of the compilation process.
 
  
==​ Publishing Content to the Fusion Edge Server ==
+
=== Performance and Durability ===
 +
[[File:Edge Server Performance.png|thumb|Edge Server Performance]]
 +
The Fusion Edge Server is built to be highly performant, even under extreme load from multiple concurrent users.  It uses a purpose built in-memory time series database, augmented with a custom caching layer, both of which have been tuned to work with the SDMX data and metadata model.  The caching layer, for instance, knows when two data queries refer to the same data sub-cube even if the queries are expressed differently in the URL.  Query optimisation similarly ensures that multiple concurrent queries for the same data are executed only once, directing the subsequent requests to the cached data once it has been built.
  
=== Compiling Source Data ===
+
The Fusion Edge Server loads the pre-compiled content into memory on application startup, an Edge Server hosting 18 datasets, with 700k series and 50 million observations will startup and be ready to serve queries in as little as 20 seconds.   
Content is published to the Fusion Edge Server by compiling datasets, structure files, and reference metadata files that are present in a local file system.  The compilation process is run using the Fusion Edge CompilerThe Fusion Edge Compiler is told the root folder to look in and it expects to find the following folder structure under the root folder:
 
  
|- data
+
Fusion Edge Server can run in [[Edge_Server_-_Publish_Content#Dynamic_Mode|Dynamic Mode]] which means an [[Edge_Server_Environment|Environment]] can be served from central shared location such as [[Fusion_Edge_Compiler#Publish_Content|Amazon S3]] or a private web server, allowing multiple Fusion Edge Server instances to update their content from the same location.  
|-- [agency id]
 
|---- [dataflow id]
 
|------ [dataflow version(data files are placed in this folder)
 
|- structure (structure files are placed in this folder)
 
|- metadata (metadata files are placed in this folder)
 
  
Where agency id, dataflow id, and dataflow version are specific to the Dataflows that the data are for.  The content can be in any SDMX format, each folder can contain multiple files, the compiler will merge the information where required.
+
Content can even be set to be made available at a specific date and time ([[Fusion_Edge_Compiler#Embargo|embargo]]), coupled with the ability to [[Fusion_Edge_Server_Properties#Embargo_Data|pre-load]] embargo data, it is possible to release a new Environment across all Fusion Edge Servers spanning multiple regions at exactly the same time (to the nearest millisecond).
  
'''Note''': The Fusion Edge Compiler can build this local file system automatically from content pulled from compliant SDMX web services such as those provided by Fusion Registry. Information is provided later about how this is achieved.
+
This architecture, has no single point of failure making the Fusion Edge Server the perfect solution for horizontal scaling.
 
 
And example folder/file content is given below:
 
 
|- data
 
|-- WB
 
|---- POVERTY
 
|------ 1.0
 
|-------- PovertyData.zip
 
|-------- PovertyUpdate.xml
 
|---- EDUCATION
 
|------ 1.0
 
|-------- EduData_1990_2010.json
 
|-------- EduData2010_2020.xml
 
|- structure
 
|-- corestructures.zip
 
|-- categories.xml
 
|-- msds.xml
 
|- metadata
 
|--metadataset1.zip
 
|--metadataset2.zip
 
 
 
The files in the file system must be in SDMX format, and may be individually zipped.  Each folder may contain multiple files.  The compilation process will combine all the files in each folder to create a consolidated output.  For example a dataflow folder may contain multiple dataset instances with different series or time periods, the output will be a single compiled dataset instance built from all the dataset files.
 
 
 
=== Full Replace vs Updates ===
 
The compiler can be run in full replace mode, in which case it will compile all files in the source directory.  Alternatively the compiler can be run in update mode.  In update mode it will only compile files which have been updated after a specific point in time.  The compiler will compare the timestamp on the file with the update after time it is given to determine whether to include the file in the compiled output or not.
 
 
 
When the compiler is run it can be given the location of the central ledger, this should be the location of the ledger used in the production environment.  The Fusion Edge Compiler uses the ledger information to know what the current live environment looks like, and to know what the current version of the live environment is, and when it was last built.  It will use this information in update mode to know what files to include in the next compile (it will only compile files that have been modified since the last compile time).  In addition, as the central ledger contains the location of the live compiled datasets, the Fusion Edge Compiler is able to download the current live dataset in order to apply any changes, if new series or observations have been updated since the last compile.  Providing the central ledger location is essential if running the Fusion Edge Server in dynamic mode (discussed later) as the Fusion Edge Compiler is able to create the new live environment by merging the current live environment with any changes, and it is able to update the central ledger correctly ready for the next release.
 
 
 
=== Compilation Output ===
 
When the compilation process is run, the compiler will generate a target folder in the location specified.  The compiler will create a number of folders under the target folder and in each folder it will write content compiled from the source folders.  It will also create a file in the root of the target folder called ledger.json, which contains the environment information, when it was created, when it should go live and what the version is.  If the target folder is not empty, the compiler will remove any files which are not part of the new environment. 
 
 
 
The final target folder will always contain the complete environment once compilation is complete.  This means the target folder, in its entirety can be published to a test Fusion Edge Server instance, as it contains the exact environment which is to become the next live environment. 
 
 
 
Each environment is versioned in the central ledger.  The first time the compiler is run the version is 1.0.0 and on subsequent compilations the version will increase based on the level of change since the last compilation.  The target folder contains ledger_indexes folder, with a file that contains the information about what content is in the release for the the environment.  The ledger index ensures that the Fusion Edge Server only pulls the files that are required in the environment, with any additional files that may exist in the same folders will be ignored.
 
 
 
=== Versioning ===
 
The results of a compilation is a new environment for the Fusion Edge Server.  The new environment may contain files that remain unchanged since the last compilation (if running in update mode and nothing was modified), files may have been removed, and new files may have been added.  The ledger file contains an entry for the new environment, including the time the compilation started, the go live time, and the version.  Timestamps are provided as Epoch time in milliseconds. The version is given as a 3 part version syntax, starting at version 1.0.0.  The three parts are known as the major version, minor version, and patch version.  The patch version is updated if  the compiler was run in update mode and the only change was to one or more datasets.  The minor version is updated if the structural metadata changed since the last compilation, this could for example be due to new time series requiring new classifications which were not in previous environments.  The major version is updated if there are new datasets for Dataflows that previously did not exist or had no data.  The major version is also updated if new reference metadata are released.
 
 
 
Example
 
Initial Release Version 1.0.0
 
Modify a dataset Version 1.0.1
 
Modify another dataset Version 1.0.2
 
Export a new Codelist Version 1.1.0
 
Add a new dataflow with data Version 2.0.0
 
 
 
=== Publishing Content ===
 
The Fusion Edge Server can be run in one of two modes, static mode, and dynamic mode.  In both modes it must be given access to the Environment so it can load it into memory.
 
 
 
 
 
==== ​Static Mode ====
 
In static mode, the Fusion Edge Server loads the environment into memory on application startup. The environment can only be updated by restarting the web server.  In static mode, the environment folder is zipped to a file named node.zip.  The zip file is placed in the root folder of the Fusion Edge Server home folder.  The Fusion Edge Server will read the node.zip file on startup. 
 
 
 
==== Dynamic Mode ====
 
In dynamic mode, the environment is placed at a location that the Fusion Edge Server can read (File System, URL, or Amazon S3).  The environment must not be zipped, and the folder structure must no be changed from the compiled output. The folder that contains the environment may contain additional files and folders, for example files from previous environments may be present, they will not be read by the Fusion Edge Server as the ledger and corresponding ledger index tell the Fusion Edge Server which files are part of the environment.
 
  
In terms of where to place the environment, one of three options are supported:
+
=== Immutable and Secure ===
# Environment content can be placed on a private web server and made accessible as a URL. In this instance the Fusion Edge Server is given the URL to the root folder, for example https://mydomain.org/subfolder/edge-conentThe Fusion Edge Server will look for the ledger.json file under this URL and then read the ledger index file. For example it will look for the following files if the latest environment is version 1.0.0:
+
Compiled Environments are digitally signed, if they are modified in any way, the modification will be detected by Fusion Edge Server and as such the Environment will be rejectedWhen Fusion Edge Server loads an Environment into memory it can not be modified, as the in memory objects are immutable. Fusion Edge Server only provides read only web servicesThere are no external interfaces to support modification of any kind to either the configuration of the application or information it holds. Fusion Edge Server is not exposed to SQL injection as it does not use SQL internally, all data stores are purpose built to efficiently process SDMX queries, and are also built so that the content is immutable.
https://mydomain.org/subfolder/edge-conent/ledger.json
 
https://mydomain.org/subfolder/edge-conent/ledger_indexes/1.0.0
 
# Environment content is placed on a file system accessible by the Fusion Edge Server.  In this instance the Fusion Edge Server is given the path to the root folder, for example /home/edge/live-environment
 
# Environment content is uploaded to Amazon S3.  In this instance the Fusion Edge Server is given the name of the S3 bucket which the environment was published to.  It also requires the AWS region, secret key and access key so it can access the content securely.
 
  
Amazon S3 is a good choice of ledger location, as it provides a central secure location for files and the Fusion Compiler is able to publish content to Amazon S3 automatically by running the publish command.  For the other two options, the environment files must be moved using anther process (not provided by the Fusion Compiler).  If moving an environment ensure the ledger.json file is moved last, as it is this file which tells the Fusion Edge Server that it needs to update its content. If the ledger.json file is moved before the content is moved, the Fusion Edge Server will fail to load the new environment.
+
This makes Fusion Edge Server the perfect solution for public data dissemination where security is a priority.
  
==== Signing Content ====
+
=== Interchangeable and Loosely Coupled ===
To ensure content is not corrupt or tampered with, the Fusion Edge Compiler will give each file a name which is generated from a hash of the file contents, coupled with a secret keyThe secret key is provided by the user at compilation time, and should always be the same for each compilation.  The Fusion Edge Server is given the same secret key as part of its configurationWhen the Fusion Edge Server loads the environment files it will also create a hash of the file contents and couple it with the same secret key, to ensure it matches the file nameIf it does not match, the environment will not be loaded as it indicates either the content was corrupted or tampered with after it was created.
+
Fusion Edge Server is built off the same core code as Fusion Registry, as such it offers the same data and structure formatsWhen Fusion Registry is upgraded to include new formats, or to expand the web service functionality, these upgrades by default go into the next release of Fusion Edge Server.  However the Fusion Edge Server has been designed such that it does not need a Fusion Registry to runAll the Fusion Edge Server requires is that it is fed with a valid SDMX structure files and valid SDMX Data files (if data dissemination is a requirement).
  
==== Specify Go Live Time ====
+
This design means Fusion Edge Server can be used to disseminate information from any system which is able to export SDMX files, including but not limited to Fusion Registry.
The compilation process can take an optional go live time, this ensures the Fusion Edge Server will not make the environment live until the specific point in time.  The content can be made accessible to the Fusion Edge Server before this point in time, but the Fusion Edge Server will not make the content live until the specified point in time.  The Fusion Edge Server can be configured in such a way to ensure the environment is pre-loaded into memory before go live time, this ensures the environment is released exactly on schedule.  The Fusion Edge Server can also be configured to pull the environment from a URL or S3 into its local file system prior to go live to remove any risks of network latency delaying the go live time.  An example configuration is to allow the Edge Server to pre-download an environment 5 minutes prior to go live, and pre-load it into memory 30 seconds prior to go live.
 
  
==== Generating content to publish ====
+
=== Scheduled Release of Data (embargo) ===
The Fusion Edge Compiler expects a specific folder structure which contains the files to compile.  The folder and file content can be created manually, by copying structure files and data files into the correct location, however, if running a Fusion Registry instance or have an SDMX compliant web service, the file system can be generated automatically using the Fusion Edge Compiler.  The Fusion Edge Compiler queries the SDMX web service for datasets, metadatasets and corresponding structural metadata in order to build the file system.
+
Data can be set a specific time and day for release. The Fusion Edge Server also be set to pre-prepare Environments for release prior to the embargo time.  For example if an Environment is scheduled for release at 12.00pm, and the Environment is moved to a secure Amazon S3 file system at 11.50, the Fusion Edge Server can be configured to pull these files into its local file system 5 minutes before go live, to eliminate any risk of network lag pulling the files.  The Fusion Ede Server can also be set to pre-load the Environment into memory 30 seconds before go live, the Environment is now ready for dissemination but disconnected from any process which can get to the data, it is still fully secure.  The Fusion Edge Server is able to then make the Environment go live by simply swapping the old Environment with the new, making it go live with millisecond precision.  
The Fusion Edge Compiler is given the configuration of what to include in the output, including which Dataflows to publish data for, which structures to include.  The Fusion Edge Compiler will ensure that whatever content it exports, the corresponding structural metadata will be included, and the structural metadata will be complete.  For example if a dataset is generated in the  output, the corresponding Dataflow and all descendants of the Dataflow will be output in the structural metadata file. The descendants of a Dataflow include the Data Structure Definition, Codelists, Concept Schemes, and Agency Scheme, everything that is required to read the data.  The Fusion Edge Compiler can be configured to output additional structures, for example Category Schemes, Hierarchical Codelists, or any other structure that is available via the web service.  The Fusion Edge Compiler can be configured to include Reference Metadata, in which case all reference metadata is included in the output, along with the corresponding Metadata Structure Definitions and metadata targets, and all descendant structures of these.
 
  
When running the extract process from the SDMX web service, the Fusion Edge Compiler can be configured to only include datasets updated after a specific point in time, in which case it will query for data using the updatedAfter query parameterThe Fusion Edge Compiler will not delete datasets in the target folder if it is running in update mode, it will only create new dataset files with the updated series and observations present.  The Fusion Edge Compiler can be given the location of the ledger when running an extract process, it will use the last compile time as the last update date to use for the extract process.
+
=== Global collection of servers managed from a single location ===
 +
Fusion Edge Server is given a location to read an Environment for dissemination.  This location may be on the local file system, or it could be a URL to a collection of files hosted on a web service.  In addition it could be a folder hosted on Amazon S3When the Environment is placed in a location which can be accessed by more then one Fusion Edge Server, it is possible to update all Fusion Edge Servers by simply updating one central Environment.  This makes it possible to host and manage content in Fusion Edge Servers hosted all over the world from a single location.  Combining this feature with Embargo makes it possible to update all Fusion Edge servers at exactly the same time from a single file system.
  
==== Restricting content to be published ====
+
=== Audited Events ===
The Fusion Edge Compiler can be configured export sub-cubes for particular Dataflows by providing Dimension filters for the given DataflowFor example it can be configured to only publish UK and French data for a particular Dataflow.  In addition, if using Fusion Registry as the source of the data and structural metadata, the Fusion Registry security rules can be used to restrict content so that it can never be extracted by the Fusion Edge Compiler.
+
Each Fusion Edge Server can be configured to persist a log of eventsThe log is in JSON format and broken down in such a way to make them easily processable to determine metrics such as where the queries are coming from, which dataset are popular, which output formats are popular, which browsers (or other agents) are popular.  
  
The Fusion Edge Compiler can query the SDMX web service either as a public user (no authentication provided) or using HTTP Basic authentication (username and password), this is the mechanism used by the Fusion Registry for authenticating users.  In this way, the Fusion Registry can create a Data Consumer user account for the Fusion Edge Compiler to use, and then content can be restricted accordingly.
+
=== Automated Data Publication Pipelines ===
 +
The Fusion Edge Server can be deployed such that data can be moved from internal systems into the edge via scripts which provide a fully automated solution.

Latest revision as of 08:30, 11 September 2023

Overview

The Fusion Edge Server is a Java web application, it’s responsibility is to host SDMX web services for the dissemination of data and related structural and reference metadata. The web services support all Data Formats and Structure Formats supported by Fusion Registry as such Fusion Edge Server can be used as a direct replacement for the dissemination services provided by Fusion Registry.

The purpose of the Fusion Edge Server is to support public or internal dissemination of data and/or metadata either via clients who use the web service API directly, or via software/web user interfaces which convert the information into a graphical display for data discovery, retrieval, and display. The driver for using Fusion Edge Server as a dissemination solution over Fusion Registry is that Fusion Edge Server offers a read only solution, and with no database it can be horizontally scaled with no single point of failure. Fusion Edge Server can also work independently of Fusion Registry, there is no requirement to run a Fusion Registry instance, giving options for users who have a requirement to host public SDMX datasets, or who want to offer dynamic access into their data from tools such as Excel, Tableau, or other statistical software.

Example Applications which make use of these web services are:

  1. Fusion Data Browser to provide a web user interface for the discovery, display and export of data.
  2. FXLData which provides data connectivity to Microsoft Excel
  3. Tableau Web Data Connector providing data connectivity to Tableau
  4. SDMX Connectors third party library which provides connectivity to multiple applications including R, Matlab, Excel
  5. Fusion Data Portal, which can automate data ingestion into a local database from remote web services

Fusion Edge Application Suite

The Fusion Edge Server suite consists of two applications:

  1. Fusion Edge Compiler. This is used to compile SDMX structure and data files into a format which can be read by the Fusion Edge Server. The output of the Fusion Edge Compiler is an Environment.
  2. Fusion Edge Server. This is web application responsible for reading the Environment into memory, and exposing the information via its SDMX web services

Design

The Fusion Edge Server reads into memory a pre-compiled Environment. An Environment is built from a collection of SDMX structure and data files, and it is this information which is made available via the public web services of Fusion Edge Server. The Fusion Edge Compiler is used to build the Environment, it does so by simply reading files on a file system and converting these into read only stores of information which can be rapidly ingested by Fusion Edge Server. The input of the Fusion Edge Compiler is SDMX files, the output of a Fusion Edge Compiler are a collection of files which comprise the Environment.

Publishing new content to the Fusion Edge Server is simply a case of moving the Environment to a location that can be read by Fusion Edge Server. Fusion Edge Server can be configured to automatically poll for updates (dynamic mode), or it can be configured to only read an Environment on application startup (static mode).

Features

Web Services and Formats

The primary purpose of the Fusion Edge Server is to host a read only web service for data and related metadata. These web services are consistent with those offered by Fusion Registry, and include:

In addition all the Data Formats and Structure Formats of Fusion Registry are supported by the Fusion Edge Server.

Performance and Durability

Edge Server Performance

The Fusion Edge Server is built to be highly performant, even under extreme load from multiple concurrent users. It uses a purpose built in-memory time series database, augmented with a custom caching layer, both of which have been tuned to work with the SDMX data and metadata model. The caching layer, for instance, knows when two data queries refer to the same data sub-cube even if the queries are expressed differently in the URL. Query optimisation similarly ensures that multiple concurrent queries for the same data are executed only once, directing the subsequent requests to the cached data once it has been built.

The Fusion Edge Server loads the pre-compiled content into memory on application startup, an Edge Server hosting 18 datasets, with 700k series and 50 million observations will startup and be ready to serve queries in as little as 20 seconds.

Fusion Edge Server can run in Dynamic Mode which means an Environment can be served from central shared location such as Amazon S3 or a private web server, allowing multiple Fusion Edge Server instances to update their content from the same location.

Content can even be set to be made available at a specific date and time (embargo), coupled with the ability to pre-load embargo data, it is possible to release a new Environment across all Fusion Edge Servers spanning multiple regions at exactly the same time (to the nearest millisecond).

This architecture, has no single point of failure making the Fusion Edge Server the perfect solution for horizontal scaling.

Immutable and Secure

Compiled Environments are digitally signed, if they are modified in any way, the modification will be detected by Fusion Edge Server and as such the Environment will be rejected. When Fusion Edge Server loads an Environment into memory it can not be modified, as the in memory objects are immutable. Fusion Edge Server only provides read only web services. There are no external interfaces to support modification of any kind to either the configuration of the application or information it holds. Fusion Edge Server is not exposed to SQL injection as it does not use SQL internally, all data stores are purpose built to efficiently process SDMX queries, and are also built so that the content is immutable.

This makes Fusion Edge Server the perfect solution for public data dissemination where security is a priority.

Interchangeable and Loosely Coupled

Fusion Edge Server is built off the same core code as Fusion Registry, as such it offers the same data and structure formats. When Fusion Registry is upgraded to include new formats, or to expand the web service functionality, these upgrades by default go into the next release of Fusion Edge Server. However the Fusion Edge Server has been designed such that it does not need a Fusion Registry to run. All the Fusion Edge Server requires is that it is fed with a valid SDMX structure files and valid SDMX Data files (if data dissemination is a requirement).

This design means Fusion Edge Server can be used to disseminate information from any system which is able to export SDMX files, including but not limited to Fusion Registry.

Scheduled Release of Data (embargo)

Data can be set a specific time and day for release. The Fusion Edge Server also be set to pre-prepare Environments for release prior to the embargo time. For example if an Environment is scheduled for release at 12.00pm, and the Environment is moved to a secure Amazon S3 file system at 11.50, the Fusion Edge Server can be configured to pull these files into its local file system 5 minutes before go live, to eliminate any risk of network lag pulling the files. The Fusion Ede Server can also be set to pre-load the Environment into memory 30 seconds before go live, the Environment is now ready for dissemination but disconnected from any process which can get to the data, it is still fully secure. The Fusion Edge Server is able to then make the Environment go live by simply swapping the old Environment with the new, making it go live with millisecond precision.

Global collection of servers managed from a single location

Fusion Edge Server is given a location to read an Environment for dissemination. This location may be on the local file system, or it could be a URL to a collection of files hosted on a web service. In addition it could be a folder hosted on Amazon S3. When the Environment is placed in a location which can be accessed by more then one Fusion Edge Server, it is possible to update all Fusion Edge Servers by simply updating one central Environment. This makes it possible to host and manage content in Fusion Edge Servers hosted all over the world from a single location. Combining this feature with Embargo makes it possible to update all Fusion Edge servers at exactly the same time from a single file system.

Audited Events

Each Fusion Edge Server can be configured to persist a log of events. The log is in JSON format and broken down in such a way to make them easily processable to determine metrics such as where the queries are coming from, which dataset are popular, which output formats are popular, which browsers (or other agents) are popular.

Automated Data Publication Pipelines

The Fusion Edge Server can be deployed such that data can be moved from internal systems into the edge via scripts which provide a fully automated solution.