Data Stores
Contents
Overview
The Fusion Registry can act as a Virtual Data Store. This means that the Registry is able to link to datasets accross multiple physical data stores, and each data store can be of a diffent type (database, web service, file). The Fusion Registry is able to query multiple data stores for a single data query and then combine the data at the end, before writing it back to the user in a response format of their choice.
Registry Managed Data Store
A Registry Managed Data Store is one which the Registry maintains. This means as data is published to the Fusion Registry, the Registry wil write the data to the store and manage the transaction. Registry managed data stores come in two flavours, a relational database (MySQL or SQL Server) store, or a Fusion Store which has been custom built by Metadata Technology for rapid storage and dissemination of data.
Fusion Data Store
The Fusion Data Store is an in-memory database, custom built by Metadata Technology for storing time series SDMX datasets. The data is ultimately persisted to the Registry database, so that when the Fusion Registry server is terminated, and re-launched, the Fusion Data Stores will be read back into memory. For the duration of the server up-time, the data store will live in memory.
Registry Managed Database
A Registry Managed Database includes support for SQL Server and MySQL database platforms, MariaDB*. The Fusion Registry is responsible for creating the database tables when new datasets are loaded. The Fusion Registry will also update table definitions if the Data Structure Definition is changed.
*Note Maria DB re-writes a query for both lastNObservation and firstNObservations to optimise away a required part of the SQL query. MariaDB can be configured to run in MySQL compatability mode, where this error does not occur.
The database table structure is a two or three tables per Dataflow. The table names are based on the Dataflow, but can be changed if required. The tables include:
- Table for Series Definitions
- Table for Observations
- Table data dataset attributes (only created if Dataset Attributes exist on the Data Structure Definition)
Column Names
The Series Table will have the following Columns created.
Column Name | Column Type | Purpose |
---|---|---|
ID | Varchar | A concatenation of the Dimension values that make up the series. The width of the ID column is a combination of all the Dimension Widths plus the number of Dimensions. An example ID is M:FR:EMP. It is important that the column width for each Dimension of the Data Structure Definition is low, i.e if the Dimenison is data type String then place a Max Width restriction on the Dimension's definition so that the database table can be constructed with a primary key that does not exceed the limit set by the database vendor. |
FR_GROUP_ID | varchar | (optional) Created on if the Data Structure Definition defines a Groups. This coulmn enable the storage of Group level Attributes |
FR_PROVIDER_ACY | varchar | Data Provider's Agency, used if the data query includes Data Provider filters |
FR_PROVIDER_ID | varchar | Data Provider's Id, used if the data query includes Data Provider filters |
FR_PROVISION | varchar | Provision Agreement short URN e.g ECB:EXR(1.0) |
FR_ERROR | varchar | Internal Use only |
FR_UPDATED_DATE | varchar | Last updated date for the series |
DIM ID/ATT_ID | text | A Column for each Dimension and Series Level Attribute. The Column Name is the ID of the Dimension/Attribute and the Column Type is dependant on the Text Format |
The Observation Table will have the following Columns created.
Column Name | Column Type | Purpose |
---|---|---|
ID | int | Auto increment primary key |
FR_KEY_ID | varchar | Foreign key to the series ID column |
FR_OBS_DATE | datetime | A datetime representaion of the TIME_PERIOD column |
OBS_VALUE | varchar | The Observation Value |
TIME_PERIOD | varchar | Reported Time Period |
FR_ERROR | varchar | Internal use only |
FR_EMBARGO | int | Deprecated |
FR_IS_PUBLIC | int | Deprecated |
FR_UPDATED_DATE_OBS | int | Last time the observation was inserted/updated |
FR_VALID_FROM | datetime | Used to support the includeHistory query parameter. A time stamp of when the observation is valid from, either taken from the Header of the dataset, or set to the time the observation was inserted |
FR_VALID_TO | datetime | Used to support the includeHistory query parameter. A time stamp of when the observation is valid to (if the observation is updated, this is used to capture previous revisions) |
FR_CHKSUM | varchar | a checksum on the column (internal use) |
ATT_ID | text | A Column for each Observation Level Attribute. The Column Name is the ID of the Attribute and the Column Type is dependant on the Text Format |
The Dataset Attribute Table, if required, will have the following Columns created.
Column Name | Column Type | Purpose |
---|---|---|
ID | Varchar | The ID of the Dataset Level Attribute. |
VALUE | varchar | The reported value for the attribute |
FR_ERROR | varchar | internal use only |
Column Data Types
The column data types that are based on the Data Structure Definitions Dimensions and Attributes are set to Varchar with the width restrictions based on the table below. If the width restriction exceeds 250 characters the column data type is set to TEXT.
Width restrictions are calculated as follows:
Component's Representation | Column Width | Example Value |
---|---|---|
String | Max Length defined on Component or 250 if unspecified |
This is a series title |
Codelist | Maximum length of the longest Code Id (e.g UK would be length 2) |
UK |
Boolean | 5 | true |
Short | 5 | 32767 |
Integer | 10 | 123 |
Long | 19 | 123456 |
Float | 12 | 12.20 |
Double | 22 | 12.20 |
Double | 22 | 12.20 |
Date | 30 | 2002-01-01T12:15:00.000GMT |
Date Time | 30 | 2002-01-01T12:15:00.000GMT |
Time Period | 30 | 2002-Q1 (any valid Time Format) |
Day | 2 | 31 |
Month | 2 | 12 |
Year_Month | 7 | 2002-01 |
Year | 4 | 2001 |
Time | 10 | 12:15:00 |
Note If the dimenison or attribute is numerical and a maximum and minimum value is set, then this will be used to determine the column width. For example if the Attribute is of type Double, with a min value of 0 and a max value of 10 with a Decimals of 3, then the maximum length is 6, allowing for the values such as 10.123 to be reported.
Externally Managed Database
An Externally Managed Database can be MySQL, SQLServer, or Oracle.
There are two options for the database table design when linking to an externally managed database, these are:
- Mirror the structure used by the Fusion Registry Managed database table structure (Series / Observation).
- A single database table for a Dataflow, where the Series and Observations are in one table
Note The Table and column names must all be in UPPER CASE.
When the Provision Agreement is linked to an externally managed data store, the user interface will provide the ability to specify what the database tables/views are used for the Series table and the Observation table. The Series table is optional, and if it is not provided then the Single Table Design will be used to query the database table. If the Series table is provided, the Series/Obs table design will be used.
Series/Obs Table Design
See the registry table design section for the full structure. The full table structure is not required, but there are some rules required for this table/view design, these are:
- The Series table must have a primary key OR a Column called ID. If there is a column called ID it will be expected to be the Primary Key column.
- If the Series table has an ID column AND it is of type Varchar, the values will be expected to be the concatenated series key (A:UK:EMP for example).
- The Obs table must either have a foreign key to the Series Table, OR a column called FR_KEY_ID which can be used to join to the primary key of the Series Table
- The Series table is expected to have a Column per Dimension, attribute columns are all optional
- The Obs table is expected to have a Column for Obs Value and Time Period, attribute columns are all optional
- All columns that are prefixed with 'FR_' are optional
Single Table Design
The flat table structure is a denormalised structure where the Series and Observations are in one table. This does lead to a repetition of dimension values where a series consists of multiple observations. The flat table structure does not support all the features of a SDMX Query including firstNObservations, lastNObservations, includeHistory.
The table structure is as follows:
Column Name | Column Type | Required | Purpose |
---|---|---|---|
DIM _ID | varchar | Required | Column per Dimension where the column name is that of the Dimension Id |
ATTR_ID | varchar | Optional | Column per Series or Observation Attribute where the column name is that of the Attribute Id |
TIME_PERIOD | varchar | Required | Reported Time Period |
FR_VALID_FROM | datetime | Optional | Used to support startPeriod, endPeriod, lastNObservations, firstNObservations query parameters |
FR_UPDATED_DATE | datetime | Optional | to support the updatedAfter query parameter |
Custom Table and Column Names
It is possible to override the Registrys own table nameing and column naming convention. To do this via a web service, use the following:
Entry Point | /ws/secure/datasource/tableMapping |
Access | Private (admin) |
Http Method | POST |
Accepts | application/json |
Compression | N/A |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - success 400 - Bad JSON syntax 401 - Unauthorized 500 - Server Error |
The JSON request has the following structure:
{ "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)", "DataSourceId" : "MY_STORE", "KeyTable" : "GCI_KEY", "ObsTable" : "GCI_OBS", "DSAttributeTable" : null, "MappedColumns" : { "REF_AREA" : "COUNTRY", "FR_UPDATE_DATE" : "UPDATED_DATE" } }
The URN field is the URN of the Dataflow to map the table/colmn names for. The Mapped Columns field is optional, and only required if column names require mapping as well as table names. The same web service can be used to update a mapping, the Dataflow URN is the key for the mapping.
To delete a mapping call the same web service with method DELETE, and post the following JSON:
{ "URN" : "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:GCI(1.0)", "DataSourceId" : "MY_STORE", }
To retrieve a mapping call the same web service with the following query parameters:
urn | urn of the Dataflow to get the table mapping description for |
dataSourceId | Id of the data store to get the table mapping description for |
Example Using Oracle
The following is a quick example showing how to set up 2 externally managed tables when using the Oracle Database and connect them to the Fusion Registry. The steps to follow are:
- Using the database tools of your choice, create the Key Table and Obs Table and populate both with data
- Load the structures (attached below) into your Fusion Registry
- Make an association in the Data Sources Manager page
Key Table:
CREATE TABLE WB_WDI_ECON_POL_DEBT_1_0_KEY ( ID VARCHAR2(250 BYTE), FREQ VARCHAR2(5 BYTE), SERIES VARCHAR2(25 BYTE), REF_AREA VARCHAR2(3 BYTE) )