Difference between revisions of "Data Store Management"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Fusion Data Store)
Line 45: Line 45:
  
 
It should be noted that as the data queries are processed in memory, it is possible to horizontally scale the data tier by simply launching an additional Fusion Registry instance.  Horizontally scaling the data tier ensures there is not a single point of failure, and increases the supported concurrency and therefore overall performance of the solution.
 
It should be noted that as the data queries are processed in memory, it is possible to horizontally scale the data tier by simply launching an additional Fusion Registry instance.  Horizontally scaling the data tier ensures there is not a single point of failure, and increases the supported concurrency and therefore overall performance of the solution.
 +
 +
====Fusion Registry Managed Relational Database====
 +
The Fusion Registry can be connected to any number of relational databases (MySQL or SQL Server) for the purpose of supporting data publishing.  The Fusion Registry is responsible for building the database tables, which are built on demand when new data are published.  Each Dataflow has its own database tables associated with it, and if they already exist when new data are published then the table is updated if required (if the DSD Concept, or Codelists change in a way which requires a modification).
 +
 +
The database tables in a Fusion Registry managed database are built based on the Data Structure Definition, and associated structures.  Having database tables tailored to each data domain ensures the data load and retrieval is as efficient as possible.
 +
 +
The Fusion Registry managed database is not as fast as the Fusion Data Store and the performance is largely dependent on the database platform used (MySQL or SQL Server) and the installed hardware and network speed.
 +
 +
Unlike the Fusion Data store, the Fusion Registry managed database records modifications made to Observations values.  These modifications are termed ‘revisions’ or ‘vintages’ and are made whenever an observation value is changed (revised) over time.  As revisions are stored it is made available via the SDMX web services, using the ‘includeHistory’ parameter.  This parameter is documented in the Registry Web Services Guide.
 +
 +
 +
[[File:Dsmfig3.PNG|600px]]
 +
 +
====Externally Managed Relational Database====
 +
=====Overview=====
 +
The Fusion Registry can connect to an Oracle, SQL Server or MySQL database in read-only mode.  This can be used when data already exists in a database which is managed by a process or application other than the Fusion Registry.
 +
 +
The Fusion Registry is able to query the database tables or views, provided they conform to the expected structure.  The table or view name and the column names are derived from the Dataflow that is linking to the data store.
 +
 +
There are two possible modes for mapping external database tables: “Default” and “Registry”
 +
 +
=====“Default” External mode=====
 +
In this mode a single table or view should be created, the name of which must use the following pattern:
 +
 +
[Dataflow Agency Id]_[Dataflow Id]_[Dataflow version]
 +
 +
The table name must be in upper case, and any ‘.’ or ‘-’ characters must be replaced with the underscore character ‘_’.  For example:
 +
 +
 +
[[File:DsmfigExt1.PNG|600px]
 +
 +
 +
 +
 +
 +
 +
 +
 +
  
 
Coming soon...
 
Coming soon...

Revision as of 10:19, 24 October 2020

Overview

Fusion Registry provides the facility of allowing Data Providers to report data. Data may be reported in a number of ways: by Registering the URL of a data File; by Registering the URL of a valid SDMX REST web service; by publishing a Dataset to a data store; by re-indexing a database which has data loaded by a process run outside of the Fusion Registry. The Fusion Registry can support any number of data collection methods, and each Dataflow may collect data in a number of ways, depending on the Data Provider. In this way, the Fusion Registry is able to operate as a Data Portal over any number of disparate data stores.


Dsmfig1.PNG

The remainder of this document describes each data store type, how to set up connections to data stores, how to link a Data Provider to a data store, and how to manage reported data.

Note: The Data Reporting guide should be read for information on how to publish data and register the location of external files or web services.

Topics

Data Store Types

Overview

Data Stores can be split into two main types, Registry Managed Data Stores, and External Data Stores. A Registry Managed Data Store is a store for which the Fusion Registry is responsible for creating database tables, importing, managing and deleting content. Registry Managed stores includes the Fusion Data Store, and any MySQL or SQL Server database that has been configured to act as a Data Store. External stores include MySQL, SQL Server, or Oracle database which has its data managed by an external process, and SDMX web services or external files.


Dsmfig2.PNG

The Fusion Registry creates and maintains the link between a data store and the SDMX Structural Metadata at the level of a Provision Agreement. The Provision Agreement defines which data store it is connected to, defaulting to any external URL (File or Web Service). Each data store can have zero to many Provision Agreements linked to it. As both Dataflows and Data Providers can be referenced by multiple Provision Agreements, it is therefore possible to split data for the same Dataflow or same Data Provider over multiple data store types.

The notion that data exists is maintained by the Data Registration. The Data Registration references the Provision Agreement and defines the URL of where the data can be retrieved for the Provision Agreement. For data source type which maps to an external File or Web Service URL, it is the responsibility of the Data Provider to supply the URL of the service or file. For all other data source types, the Fusion Registry is responsible for generating, updating, and deleting the Data Registration, where the URL of these Data Registrations is to the Fusion Registry’s own public web service.

The UML diagram below shows the relationship and cardinality between the SDMX Structures and the Fusion Registry data store types.


Dsmfiguml.PNG

The remainder of this section describes each data store type in more detail.

Fusion Data Store

The Fusion Data Store provides a high-speed in-memory database pre-installed in the Fusion Registry. Data published to this store is persisted to the database that the Fusion Registry is connected to, however all data is served from memory using a highly efficient in memory model.

Metrics for the Fusion Data Store are shown below:


Dsmfigmetrics.PNG

When benchmarked against the same query for a MySQL instance (on a desktop PC) the graph looks as follows. Note: All caching is disabled:


Dsmfigresponse.PNG

It should be noted that as the data queries are processed in memory, it is possible to horizontally scale the data tier by simply launching an additional Fusion Registry instance. Horizontally scaling the data tier ensures there is not a single point of failure, and increases the supported concurrency and therefore overall performance of the solution.

Fusion Registry Managed Relational Database

The Fusion Registry can be connected to any number of relational databases (MySQL or SQL Server) for the purpose of supporting data publishing. The Fusion Registry is responsible for building the database tables, which are built on demand when new data are published. Each Dataflow has its own database tables associated with it, and if they already exist when new data are published then the table is updated if required (if the DSD Concept, or Codelists change in a way which requires a modification).

The database tables in a Fusion Registry managed database are built based on the Data Structure Definition, and associated structures. Having database tables tailored to each data domain ensures the data load and retrieval is as efficient as possible.

The Fusion Registry managed database is not as fast as the Fusion Data Store and the performance is largely dependent on the database platform used (MySQL or SQL Server) and the installed hardware and network speed.

Unlike the Fusion Data store, the Fusion Registry managed database records modifications made to Observations values. These modifications are termed ‘revisions’ or ‘vintages’ and are made whenever an observation value is changed (revised) over time. As revisions are stored it is made available via the SDMX web services, using the ‘includeHistory’ parameter. This parameter is documented in the Registry Web Services Guide.


Dsmfig3.PNG

Externally Managed Relational Database

Overview

The Fusion Registry can connect to an Oracle, SQL Server or MySQL database in read-only mode. This can be used when data already exists in a database which is managed by a process or application other than the Fusion Registry.

The Fusion Registry is able to query the database tables or views, provided they conform to the expected structure. The table or view name and the column names are derived from the Dataflow that is linking to the data store.

There are two possible modes for mapping external database tables: “Default” and “Registry”

“Default” External mode

In this mode a single table or view should be created, the name of which must use the following pattern:

[Dataflow Agency Id]_[Dataflow Id]_[Dataflow version]

The table name must be in upper case, and any ‘.’ or ‘-’ characters must be replaced with the underscore character ‘_’. For example:


[[File:DsmfigExt1.PNG|600px]






Coming soon...

Data Sources Manager

Monitoring Data Publication Activity

Coming soon...