Data Stores

From Fusion Registry Wiki
Revision as of 08:52, 20 February 2020 by Mnelson (talk | contribs) (Column Names)
Jump to navigation Jump to search

Overview

The Fusion Registry can act as a Virtual Data Store. This means that the Registry is able to link to datasets accross multiple physical data stores, and each data store can be of a diffent type (database, web service, file). The Fusion Registry is able to query multiple data stores for a single data query and then combine the data at the end, before writing it back to the user in a response format of their choice.

Registry Managed Data Store

A Registry Managed Data Store is one which the Registry maintains. This means as data is published to the Fusion Registry, the Registry wil write the data to the store and manage the transaction. Registry managed data stores come in two flavours, a relational database (MySQL or SQL Server) store, or a Fusion Store which has been custom built by Metadata Technology for rapid storage and dissemination of data.

Fusion Data Store

The Fusion Data Store is an in-memory database, custom built by Metadata Technology for storing time series SDMX datasets. The data is ultimately persisted to the Registry database, so that when the Fusion Registry server is terminated, and re-launched, the Fusion Data Stores will be read back into memory. For the duration of the server up-time, the data store will live in memory.

Registry Managed Database

A Registry Managed Database includes support for SQL Server and MySQL database platforms, MariaDB*. The Fusion Registry is responsible for creating the database tables when new datasets are loaded. The Fusion Registry will also update table definitions if the Data Structure Definition is changed.

*Note Maria DB re-writes a query for both lastNObservation and firstNObservations to optimise away a required part of the SQL query. MariaDB can be configured to run in MySQL compatability mode, where this error does not occur.

The database table structure is a two or three tables per Dataflow. The table names are based on the Dataflow, but can be changed if required. The tables include:

  • Table for Series Definitions
  • Table for Observations
  • Table data dataset attributes (only created if Dataset Attributes exist on the Data Structure Definition)

Column Names

The Series Table will have the following Columns created.

Column Name Column Type Purpose
ID Varchar A concatenation of the Dimension values that make up the series.
The width of the ID column is a combination of all the Dimension Widths plus the number of Dimensions.

An example ID is M:FR:EMP. It is important that the column width for each Dimension of the Data Structure Definition is low, i.e if the Dimenison is data type String then place a Max Width restriction on the Dimension's definition so that the database table can be constructed with a primary key that does not exceed the limit set by the database vendor.

FR_GROUP_ID varchar (optional) Created on if the Data Structure Definition defines a Groups.
This coulmn enable the storage of Group level Attributes
FR_PROVIDER_ACY varchar Data Provider's Agency, used if the data query includes Data Provider filters
FR_PROVIDER_ID varchar Data Provider's Id, used if the data query includes Data Provider filters
FR_PROVISION varchar Provision Agreement short URN e.g ECB:EXR(1.0)
FR_ERROR varchar Internal Use only
FR_UPDATED_DATE varchar Last updated date for the series
DIM ID/ATT_ID text A Column for each Dimension and Series Level Attribute. The Column Name is the ID of the Dimension/Attribute and the Column Type is dependant on the Text Format

The Observation Table will have the following Columns created.

Column Name Column Type Purpose
ID int Auto increment primary key
FR_KEY_ID varchar Foreign key to the series ID column
FR_OBS_DATE datetime A datetime representaion of the TIME_PERIOD column
OBS_VALUE varchar The Observation Value
TIME_PERIOD varchar Reported Time Period
FR_ERROR varchar Internal use only
FR_EMBARGO int Deprecated
FR_IS_PUBLIC int Deprecated
FR_UPDATED_DATE_OBS int Last time the observation was inserted/updated
FR_VALID_FROM datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid from, either taken from the Header of the dataset, or set to the time the observation was inserted
FR_VALID_TO datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid to (if the observation is updated, this is used to capture previous revisions)
FR_CHKSUM varchar a checksum on the column (internal use)
ATT_ID text A Column for each Observation Level Attribute. The Column Name is the ID of the Attribute and the Column Type is dependant on the Text Format

The Dataset Attribute Table, if required, will have the following Columns created.

Column Name Column Type Purpose
ID Varchar The ID of the Dataset Level Attribute.
VALUE varchar The reported value for the attribute
FR_ERROR varchar internal use only

Column Data Types

The column data types that are based on the Data Structure Definitions Dimensions and Attributes are set to Varchar with the width restrictions based on the table below. If the width restriction exceeds 250 characters the column data type is set to TEXT.

Width restrictions are calculated as follows:

Component's Representation Column Width Example Value
String Max Length defined on Component
or 250 if unspecified
This is a series title
Codelist Maximum length of the longest
Code Id (e.g UK would be length 2)
UK
Boolean 5 true
Short 5 32767
Integer 10 123
Long 19 123456
Float 12 12.20
Double 22 12.20
Double 22 12.20
Date 30 2002-01-01T12:15:00.000GMT
Date Time 30 2002-01-01T12:15:00.000GMT
Time Period 30 2002-Q1 (any valid Time Format)
Day 2 31
Month 2 12
Year_Month 7 2002-01
Year 4 2001
Time 10 12:15:00

Note If the dimenison or attribute is numerical and a maximum and minimum value is set, then this will be used to determine the column width. For example if the Attribute is of type Double, with a min value of 0 and a max value of 10 with a Decimals of 3, then the maximum length is 6, allowing for the values such as 10.123 to be reported.

Externally Managed Database

An Externally Managed Database can be MySQL, SQLServer, or Oracle. The database table structure can either consist of a Single database table per Dataflow, or a table structure to mirror that of the Fusion Registry Managed database table structure (Series / Observation)

Flat Table

Multi Table

The Series and Observation table structure is based on the Fusion Registry's own table structure. It is advised to include all the Dimesion and Attribute Columns, with the Primary key of the Series Table, and Foreign Key of the Observation table. Other columns are optional, however ommiting a column can result in reduced support for Data Queries. For example not include FR_OBS_DATE will result in the data query not supporting the startPeriod or endPeriod query parameters.