Data Stores

From Fusion Registry Wiki
Revision as of 09:48, 10 March 2023 by Plazarou (talk | contribs) (Key Table:)
Jump to navigation Jump to search

Overview

The Fusion Registry can act as a Virtual Data Store. This means that the Registry is able to link to datasets accross multiple physical data stores, and each data store can be of a diffent type (database, web service, file). The Fusion Registry is able to query multiple data stores for a single data query and then combine the data at the end, before writing it back to the user in a response format of their choice.

Registry Managed Data Store

A Registry Managed Data Store is one which the Registry maintains. This means as data is published to the Fusion Registry, the Registry wil write the data to the store and manage the transaction. Registry managed data stores come in two flavours, a relational database (MySQL or SQL Server) store, or a Fusion Store which has been custom built by Metadata Technology for rapid storage and dissemination of data.

Fusion Data Store

The Fusion Data Store is an in-memory database, custom built by Metadata Technology for storing time series SDMX datasets. The data is ultimately persisted to the Registry database, so that when the Fusion Registry server is terminated, and re-launched, the Fusion Data Stores will be read back into memory. For the duration of the server up-time, the data store will live in memory.

Registry Managed Database

A Registry Managed Database includes support for SQL Server and MySQL database platforms, MariaDB*. The Fusion Registry is responsible for creating the database tables when new datasets are loaded. The Fusion Registry will also update table definitions if the Data Structure Definition is changed.

*Note Maria DB re-writes a query for both lastNObservation and firstNObservations to optimise away a required part of the SQL query. MariaDB can be configured to run in MySQL compatability mode, where this error does not occur.

The database table structure is a two or three tables per Dataflow. The table names are based on the Dataflow, but can be changed if required. The tables include:

  • Table for Series Definitions
  • Table for Observations
  • Table data dataset attributes (only created if Dataset Attributes exist on the Data Structure Definition)

Column Names

The Series Table will have the following Columns created.

Column Name Column Type Purpose
ID Varchar A concatenation of the Dimension values that make up the series.
The width of the ID column is a combination of all the Dimension Widths plus the number of Dimensions.

An example ID is M:FR:EMP. It is important that the column width for each Dimension of the Data Structure Definition is low, i.e if the Dimenison is data type String then place a Max Width restriction on the Dimension's definition so that the database table can be constructed with a primary key that does not exceed the limit set by the database vendor.

FR_GROUP_ID varchar (optional) Created on if the Data Structure Definition defines a Groups.
This coulmn enable the storage of Group level Attributes
FR_PROVIDER_ACY varchar Data Provider's Agency, used if the data query includes Data Provider filters
FR_PROVIDER_ID varchar Data Provider's Id, used if the data query includes Data Provider filters
FR_PROVISION varchar Provision Agreement short URN e.g ECB:EXR(1.0)
FR_ERROR varchar Internal Use only
FR_UPDATED_DATE varchar Last updated date for the series
DIM ID/ATT_ID text A Column for each Dimension and Series Level Attribute. The Column Name is the ID of the Dimension/Attribute and the Column Type is dependant on the Text Format

The Observation Table will have the following Columns created.

Column Name Column Type Purpose
ID int Auto increment primary key
FR_KEY_ID varchar Foreign key to the series ID column
FR_OBS_DATE datetime A datetime representaion of the TIME_PERIOD column
OBS_VALUE varchar The Observation Value
TIME_PERIOD varchar Reported Time Period
FR_ERROR varchar Internal use only
FR_EMBARGO int Deprecated
FR_IS_PUBLIC int Deprecated
FR_UPDATED_DATE_OBS int Last time the observation was inserted/updated
FR_VALID_FROM datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid from, either taken from the Header of the dataset, or set to the time the observation was inserted
FR_VALID_TO datetime Used to support the includeHistory query parameter.
A time stamp of when the observation is valid to (if the observation is updated, this is used to capture previous revisions)
FR_CHKSUM varchar a checksum on the column (internal use)
ATT_ID text A Column for each Observation Level Attribute. The Column Name is the ID of the Attribute and the Column Type is dependant on the Text Format

The Dataset Attribute Table, if required, will have the following Columns created.

Column Name Column Type Purpose
ID Varchar The ID of the Dataset Level Attribute.
VALUE varchar The reported value for the attribute
FR_ERROR varchar internal use only

Column Data Types

The column data types that are based on the Data Structure Definitions Dimensions and Attributes are set to Varchar with the width restrictions based on the table below. If the width restriction exceeds 250 characters the column data type is set to TEXT.

Width restrictions are calculated as follows:

Component's Representation Column Width Example Value
String Max Length defined on Component
or 250 if unspecified
This is a series title
Codelist Maximum length of the longest
Code Id (e.g UK would be length 2)
UK
Boolean 5 true
Short 5 32767
Integer 10 123
Long 19 123456
Float 12 12.20
Double 22 12.20
Double 22 12.20
Date 30 2002-01-01T12:15:00.000GMT
Date Time 30 2002-01-01T12:15:00.000GMT
Time Period 30 2002-Q1 (any valid Time Format)
Day 2 31
Month 2 12
Year_Month 7 2002-01
Year 4 2001
Time 10 12:15:00

Note If the dimenison or attribute is numerical and a maximum and minimum value is set, then this will be used to determine the column width. For example if the Attribute is of type Double, with a min value of 0 and a max value of 10 with a Decimals of 3, then the maximum length is 6, allowing for the values such as 10.123 to be reported.

Externally Managed Database

An Externally Managed Database can be MySQL, SQLServer, or Oracle.

There are two options for the database table design when linking to an externally managed database, these are:

  1. Mirror the structure used by the Fusion Registry Managed database table structure (Series / Observation).
  2. A single database table for a Dataflow, where the Series and Observations are in one table

Note The Table and column names must all be in UPPER CASE.

When the Provision Agreement is linked to an externally managed data store, the user interface will provide the ability to specify what the database tables/views are used for the Series table and the Observation table. The Series table is optional, and if it is not provided then the Single Table Design will be used to query the database table. If the Series table is provided, the Series/Obs table design will be used.

Series/Obs Table Design

See the registry table design section for the full structure. The full table structure is not required, but there are some rules required for this table/view design, these are:

  1. The Series table must have a primary key OR a Column called ID. If there is a column called ID it will be expected to be the Primary Key column.
  2. If the Series table has an ID column AND it is of type Varchar, the values will be expected to be the concatenated series key (A:UK:EMP for example).
  3. The Obs table must either have a foreign key to the Series Table, OR a column called FR_KEY_ID which can be used to join to the primary key of the Series Table
  4. The Series table is expected to have a Column per Dimension, attribute columns are all optional
  5. The Obs table is expected to have a Column for Obs Value and Time Period, attribute columns are all optional
  6. All columns that are prefixed with 'FR_' are optional

Single Table Design

The flat table structure is a denormalised structure where the Series and Observations are in one table. This does lead to a repetition of dimension values where a series consists of multiple observations. The flat table structure does not support all the features of a SDMX Query including firstNObservations, lastNObservations, includeHistory.

The table structure is as follows:

Column Name Column Type Required Purpose
DIM _ID varchar Required Column per Dimension where the column name is that of the Dimension Id
ATTR_ID varchar Optional Column per Series or Observation Attribute where the column name is that of the Attribute Id
TIME_PERIOD varchar Required Reported Time Period
FR_VALID_FROM datetime Optional Used to support startPeriod, endPeriod, lastNObservations, firstNObservations query parameters
FR_UPDATED_DATE datetime Optional to support the updatedAfter query parameter

Custom Table and Column Names

It is possible to override the Registrys own table nameing and column naming convention. To do this via a web service, use the following:

Entry Point /ws/secure/datasource/tableMapping
Access Private (admin)
Http Method POST
Accepts application/json
Compression N/A
Content-Type application/json
Response Format application/json
Response Statuses

200 - success

400 - Bad JSON syntax

401 - Unauthorized

500 - Server Error

The JSON request has the following structure:

{
  "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)",
  "DataSourceId" : "MY_STORE",
  "KeyTable" : "GCI_KEY",
  "ObsTable" : "GCI_OBS",
  "DSAttributeTable" : null,
  "MappedColumns" : {
    "REF_AREA" : "COUNTRY",
    "FR_UPDATE_DATE" : "UPDATED_DATE" 
  }
}

The URN field is the URN of the Dataflow to map the table/colmn names for. The Mapped Columns field is optional, and only required if column names require mapping as well as table names. The same web service can be used to update a mapping, the Dataflow URN is the key for the mapping.

To delete a mapping call the same web service with method DELETE, and post the following JSON:

{
  "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)",
  "DataSourceId" : "MY_STORE",
}

To retrieve a mapping call the same web service with the following query parameters:

urn urn of the Dataflow to get the table mapping description for
dataSourceId Id of the data store to get the table mapping description for

Example Using Oracle

The following is a quick example showing how to set up 2 externally managed tables when using the Oracle Database and connect them to the Fusion Registry. The steps to follow are:

  1. Using the database tools of your choice, create the Key Table and Obs Table and populate both with data
  2. Load the structures (attached below) into your Fusion Registry
  3. Make an association in the Data Sources Manager page

Key Table:

CREATE TABLE WB_WDI_ECON_POL_DEBT_1_0_KEY (
 ID        VARCHAR2(250 BYTE),
 FREQ      VARCHAR2(5 BYTE),
 SERIES    VARCHAR2(25 BYTE),
 REF_AREA  VARCHAR2(3 BYTE)
)

Sample Data for Key Table:

Obs Table:


Sample Data for Obs Table:


Dataflow, DSD and other related structures: