Data Store Management
Overview
Fusion Registry provides the facility of allowing Data Providers to report data. Data may be reported in a number of ways: by Registering the URL of a data File; by Registering the URL of a valid SDMX REST web service; by publishing a Dataset to a data store; by re-indexing a database which has data loaded by a process run outside of the Fusion Registry. The Fusion Registry can support any number of data collection methods, and each Dataflow may collect data in a number of ways, depending on the Data Provider. In this way, the Fusion Registry is able to operate as a Data Portal over any number of disparate data stores.
The remainder of this document describes each data store type, how to set up connections to data stores, how to link a Data Provider to a data store, and how to manage reported data.
Note: The Data Reporting guide should be read for information on how to publish data and register the location of external files or web services.
Topics
Data Store Types
Overview
Data Stores can be split into two main types, Registry Managed Data Stores, and External Data Stores. A Registry Managed Data Store is a store for which the Fusion Registry is responsible for creating database tables, importing, managing and deleting content. Registry Managed stores includes the Fusion Data Store, and any MySQL or SQL Server database that has been configured to act as a Data Store. External stores include MySQL, SQL Server, or Oracle database which has its data managed by an external process, and SDMX web services or external files.
The Fusion Registry creates and maintains the link between a data store and the SDMX Structural Metadata at the level of a Provision Agreement. The Provision Agreement defines which data store it is connected to, defaulting to any external URL (File or Web Service). Each data store can have zero to many Provision Agreements linked to it. As both Dataflows and Data Providers can be referenced by multiple Provision Agreements, it is therefore possible to split data for the same Dataflow or same Data Provider over multiple data store types.
The notion that data exists is maintained by the Data Registration. The Data Registration references the Provision Agreement and defines the URL of where the data can be retrieved for the Provision Agreement. For data source type which maps to an external File or Web Service URL, it is the responsibility of the Data Provider to supply the URL of the service or file. For all other data source types, the Fusion Registry is responsible for generating, updating, and deleting the Data Registration, where the URL of these Data Registrations is to the Fusion Registry’s own public web service.
The UML diagram below shows the relationship and cardinality between the SDMX Structures and the Fusion Registry data store types.
The remainder of this section describes each data store type in more detail.
Fusion Data Store
The Fusion Data Store provides a high-speed in-memory database pre-installed in the Fusion Registry. Data published to this store is persisted to the database that the Fusion Registry is connected to, however all data is served from memory using a highly efficient in memory model.
Metrics for the Fusion Data Store are shown below:
When benchmarked against the same query for a MySQL instance (on a desktop PC) the graph looks as follows. Note: All caching is disabled:
It should be noted that as the data queries are processed in memory, it is possible to horizontally scale the data tier by simply launching an additional Fusion Registry instance. Horizontally scaling the data tier ensures there is not a single point of failure, and increases the supported concurrency and therefore overall performance of the solution.
Coming soon...
Monitoring Data Publication Activity
Coming soon...