Difference between revisions of "Data Stores"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Custom Table and Column Names)
m (Mnelson moved page Registry Data Stores to Data Stores)
(No difference)

Revision as of 01:06, 21 February 2020

Overview

The Fusion Registry can act as a Virtual Data Store. This means that the Registry is able to link to datasets accross multiple physical data stores, and each data store can be of a diffent type (database, web service, file). The Fusion Registry is able to query multiple data stores for a single data query and then combine the data at the end, before writing it back to the user in a response format of their choice.

Registry Managed Data Store

A Registry Managed Data Store is one which the Registry maintains. This means as data is published to the Fusion Registry, the Registry wil write the data to the store and manage the transaction. Registry managed data stores come in two flavours, a relational database (MySQL or SQL Server) store, or a Fusion Store which has been custom built by Metadata Technology for rapid storage and dissemination of data.

Fusion Data Store

The Fusion Data Store is an in-memory database, custom built by Metadata Technology for storing time series SDMX datasets. The data is ultimately persisted to the Registry database, so that when the Fusion Registry server is terminated, and re-launched, the Fusion Data Stores will be read back into memory. For the duration of the server up-time, the data store will live in memory.

Registry Managed Database

A Registry Managed Database includes support for SQL Server and MySQL database platforms, MariaDB*. The Fusion Registry is responsible for creating the database tables when new datasets are loaded. The Fusion Registry will also update table definitions if the Data Structure Definition is changed.

*Note Maria DB re-writes a query for both lastNObservation and firstNObservations to optimise away a required part of the SQL query. MariaDB can be configured to run in MySQL compatability mode, where this error does not occur.

The database table structure is a two or three tables per Dataflow. The table names are based on the Dataflow, but can be changed if required. The tables include:

  • Table for Series Definitions
  • Table for Observations
  • Table data dataset attributes (only created if Dataset Attributes exist on the Data Structure Definition)

Column Names

The Series Table will have the following Columns created.

Column Name Column Type Purpose
ID Varchar A concatenation of the Dimension values that make up the series.
The width of the ID column is a combination of all the Dimension Widths plus the number of Dimensions.

An example ID is M:FR:EMP. It is important that the column width for each Dimension of the Data Structure Definition is low, i.e if the Dimenison is data type String then place a Max Width restriction on the Dimension's definition so that the database table can be constructed with a primary key that does not exceed the limit set by the database vendor.

FR_GROUP_ID varchar (optional) Created on if the Data Structure Definition defines a Groups.
This coulmn enable the storage of Group level Attributes
FR_PROVIDER_ACY varchar Data Provider's Agency, used if the data query includes Data Provider filters
FR_PROVIDER_ID varchar Data Provider's Id, used if the data query includes Data Provider filters
FR_PROVISION varchar Provision Agreement short URN e.g ECB:EXR(1.0)
FR_ERROR varchar Internal Use only
FR_UPDATED_DATE varchar Last updated date for the series
DIM ID/ATT_ID text A Column for each Dimension and Series Level Attribute. The Column Name is the ID of the Dimension/Attribute and the Column Type is dependant on the Text Format

The Observation Table will have the following Columns created.

Column Name Column Type Purpose
ID int Auto increment primary key
FR_KEY_ID varchar Foreign key to the series ID column
FR_OBS_DATE datetime A datetime representaion of the TIME_PERIOD column
OBS_VALUE varchar The Observation Value
TIME_PERIOD varchar Reported Time Period
FR_ERROR varchar Internal use only
FR_EMBARGO int Deprecated
FR_IS_PUBLIC int Deprecated
FR_UPDATED_DATE_OBS int Last time the observation was inserted/updated
FR_VALID_FROM datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid from, either taken from the Header of the dataset, or set to the time the observation was inserted
FR_VALID_TO datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid to (if the observation is updated, this is used to capture previous revisions)
FR_CHKSUM varchar a checksum on the column (internal use)
ATT_ID text A Column for each Observation Level Attribute. The Column Name is the ID of the Attribute and the Column Type is dependant on the Text Format

The Dataset Attribute Table, if required, will have the following Columns created.

Column Name Column Type Purpose
ID Varchar The ID of the Dataset Level Attribute.
VALUE varchar The reported value for the attribute
FR_ERROR varchar internal use only

Column Data Types

The column data types that are based on the Data Structure Definitions Dimensions and Attributes are set to Varchar with the width restrictions based on the table below. If the width restriction exceeds 250 characters the column data type is set to TEXT.

Width restrictions are calculated as follows:

Component's Representation Column Width Example Value
String Max Length defined on Component
or 250 if unspecified
This is a series title
Codelist Maximum length of the longest
Code Id (e.g UK would be length 2)
UK
Boolean 5 true
Short 5 32767
Integer 10 123
Long 19 123456
Float 12 12.20
Double 22 12.20
Double 22 12.20
Date 30 2002-01-01T12:15:00.000GMT
Date Time 30 2002-01-01T12:15:00.000GMT
Time Period 30 2002-Q1 (any valid Time Format)
Day 2 31
Month 2 12
Year_Month 7 2002-01
Year 4 2001
Time 10 12:15:00

Note If the dimenison or attribute is numerical and a maximum and minimum value is set, then this will be used to determine the column width. For example if the Attribute is of type Double, with a min value of 0 and a max value of 10 with a Decimals of 3, then the maximum length is 6, allowing for the values such as 10.123 to be reported.

Externally Managed Database

An Externally Managed Database can be MySQL, SQLServer, or Oracle. The database table structure can either consist of a Single database table per Dataflow, or a table structure to mirror that of the Fusion Registry Managed database table structure (Series / Observation)

Flat Table

The flat table structure is a denormalised stracutre where the Series and Observations are in one table. This does lead to a repetiton of dimension values where a series consists of mulitple observations. The flat table structure does not support all the features of a SDMX Query including firstNObservations, lastNObservations, includeHistory.

The table structure is as follows:

Column Name Column Type Required Purpose
DIM _ID varchar Required Column per Dimension where the column name is that of the Dimension Id
ATTR_ID varchar Optional Column per Series or Observation Attribute where the column name is that of the Attribute Id
TIME_PERIOD varchar Required Reported Time Period
FR_VALID_FROM datetime Optional Used to support startPeriod, endPeriod, lastNObservations, firstNObservations query parameters
FR_UPDATED_DATE datetime Optional to support the updatedAfter query parameter

Multi Table

The Series and Observation table structure is based on the Fusion Registry's own table structure. Most the columns are option, however it is advised to include all the Dimesion and Attribute Columns, the Primary key of the Series Table (ID), and Foreign Key of the Observation table (FR_KEY_ID). Other columns are optional, however ommiting a column can result in reduced support for Data Queries. For example not including the column FR_OBS_DATE will result in the data query not supporting the startPeriod or endPeriod query parameters.

The default naming convention is:

Table Type Table Name Example
Dataset Attributes [dataflow agency]_[dataflow id]_[dataflow version]_ds ecb_exr_1_0_ds
Series [dataflow agency]_[dataflow id]_[dataflow version]_key ecb_exr_1_0_key
Observation [dataflow agency]_[dataflow id]_[dataflow version]_obs ecb_exr_1_0_obs

Note 1: The Version number is converted from dot (period) separators to underscore separators. Note 2: If the name exceeds 26 characters, a hashed name will be used where the hash is built from the long name. In this instance it is reccomended to use the Registry Custom Table Name feature, which can be accessed from the UI Data Store Manager, or the web service (described below).

Custom Table and Column Names

It is possible to override the Registrys own table nameing and column naming convention. To do this via a web service, use the following:

Entry Point /ws/secure/datasource/tableMapping
Access Private (admin)
Http Method POST
Accepts application/json
Compression N/A
Content-Type application/json
Response Format application/json
Response Statuses

200 - success

400 - Bad JSON syntax

401 - Unauthorized

500 - Server Error

The JSON request has the following structure:

{
  "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)",
  "DataSourceId" : "MY_STORE",
  "KeyTable" : "GCI_KEY",
  "ObsTable" : "GCI_OBS",
  "DSAttributeTable" : null,
  "MappedColumns" : {
    "REF_AREA" : "COUNTRY",
    "FR_UPDATE_DATE" : "UPDATED_DATE" 
  }
}

The URN field is the URN of the Dataflow to map the table/colmn names for. The Mapped Columns field is optional, and only required if column names require mapping as well as table names. The same web service can be used to update a mapping, the Dataflow URN is the key for the mapping.

To delete a mapping call the same web service with method DELETE, and post the following JSON:

{
  "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)",
  "DataSourceId" : "MY_STORE",
}

To retrieve a mapping call the same web service with the following query parameters:

urn urn of the Dataflow to get the table mapping description for
dataSourceId Id of the data store to get the table mapping description for