Difference between revisions of "Fusion Data Mapper"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Remove a Mapped Dataset)
 
(158 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Overview ==
+
[[Category:Fusion Data Mapper]]
 +
== Overview – Fusion Data Mapper==
 
This document provides guidance and operating procedures for creating and managing mapped
 
This document provides guidance and operating procedures for creating and managing mapped
datasets using Fusion Registry 9 and the Fusion Data Mapper.
+
datasets using Fusion Registry 10 and the Fusion Data Mapper.
 +
 
 +
'''Use Case'''
  
==== Use Case ====
 
 
The primary use case is transforming single dimensional datasets to SDMX multi-dimensional
 
The primary use case is transforming single dimensional datasets to SDMX multi-dimensional
 
structures.
 
structures.
Line 15: Line 17:
 
Mapper provides an easy-to-use user interface for defining and management the mapping rules.
 
Mapper provides an easy-to-use user interface for defining and management the mapping rules.
  
==== Audience ====
+
'''Audience'''
 
* Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
 
* Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
 
* Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
 
* Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
 
* System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.
 
* System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.
==== Prerequisites ====
+
 
 +
'''Prerequisites'''
 +
 
 
Readers are assumed to have an understanding of basic SDMX principles and the purpose of the
 
Readers are assumed to have an understanding of basic SDMX principles and the purpose of the
 
main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data
 
main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data
 
Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.
 
Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.
  
==== Terminology ====
+
'''Terminology'''
Dataset Dataset           
 
 
 
refers to a named collection of series that typically all fall under a
 
specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX
 
Dataflow represents a dataset.
 
Mapped Dataset A Mapped Dataset is an SDMX Dataflow where data is taken from a
 
‘source’ Dataflow and transformed to different dimensionality using
 
defined mapping rules. The Fusion Data Mapper manages these mapping
 
rules.
 
In this document, the source Dataflow is assumed to be observation data
 
from the Time Series Database which is described by a Data Structure
 
Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE
 
dimensions.
 
Time Series Database The source of time series observation data without metadata that Fusion
 
Registry maps to Mapped Datasets using the defined mapping rules.
 
 
 
== The Fusion Data Mapper User Interface ==
 
The Fusion Data Mapper is a web user interface providing the following main functions:
 
 
 
'''Authenticated users with sufficient structural metadata maintenance privileges'''
 
 
 
* Add and remove mapped datasets
 
* Add and remove series on mapped datasets
 
* Interactively set and change the metadata values on a series by series basis
 
* Export metadata values for selected series to Excel
 
* Import metadata values for defined series from Excel
 
* Change code names with impact analysis
 
 
 
'''Anonymous or authenticated users with sufficient privileges to view but not change the structural metadata'''
 
 
 
* Browse the catalogue of mapped datasets
 
* Examine the ‘definition’ of a dataset – its dimensionality and list of possible codes for each
 
* Dimension or Attribute
 
* Browse the series in each dataset
 
 
 
== The Fusion Registry Administration Interface ==
 
 
 
The Administration Interface is Fusion Registry’s main web user interface.
 
 
 
For the purposes of managing the metadata on mapped datasets, it provides the following functions:
 
 
 
'''Authenticated users with sufficient structural metadata management privileges'''
 
 
 
* Create and modify SDMX Data Structure Definitions (DSDs)
 
* Create and modify SDMX Concepts
 
* Create and modify SDMX Codelists
 
* Add and remove codes from SDMX Codelists
 
* Register a series (series must be ‘registered’ before they can be mapped in dataset by adding the Series Code and Series Name to the relevant SERIES_CODE Codelist)
 
 
 
Refer to the ''Fusion Registry Structural Metadata Management Guide'' for general information on using the Fusion Registry Administration Interface for creating and maintaining core SDMX structure
 
metadata artefacts including DSDs, Dataflows, Concepts, Categories and Codelists.
 
 
 
== Operating Procedures ==
 
 
 
=== Add a Mapped Dataset ===
 
 
 
A mapped dataset is an SDMX Dataflow and an associated SDMX Dataflow Map that describes:
 
:::(a) The dataset’s dimensionality using an SDMX Data Structure Definition (DSD)
 
:::(b) The list of series in the dataset
 
:::(c) The metadata values for each series
 
 
 
Use cases:
 
* Creating a new dataset
 
* Creating an alternative version of an existing dataset perhaps with a different compliment of series and / or dimensionality
 
* Creating an alternative version of a dataset with simplified dimensionality for public dissemination
 
  
The Fusion Data Mapper provides a convenient way to interactively manage the process. However, it
+
{|
is important to note that creating, modifying and examining the underlying SDMX artefacts can also
+
|-style="vertical-align: top;"
be done using the Fusion Registry Administration Interface or the REST API which may be useful for
+
| Dataset ||  Dataset refers to a named collection of series that typically all fall under a specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX Dataflow represents a dataset.
debugging purposes. Discussion of these topics is outside of the scope of this document.
 
  
==== Add a Mapped Dataset - Prerequisites ====
 
  
# The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface.
 
# The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.
 
  
==== Add a Mapped Dataset - Required Roles and Privileges ====
+
|-style="vertical-align: top;"
 +
| Mapped Dataset||   A Mapped Dataset is an SDMX Dataflow where data is taken from a ‘source’ Dataflow and transformed to different dimensionality using defined mapping rules. The Fusion Data Mapper manages these mapping rules.
 +
 In this document, the source Dataflow is assumed to be observation data from the Time Series Database which is described by a Data Structure Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE dimensions.
  
To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place.
 
  
Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include:
+
|-style="vertical-align: top;"
 +
| Time Series Database||   The source of time series observation data without metadata that Fusion Registry maps to Mapped Datasets using the defined mapping rules.
 +
|}
  
* Removing the dataset
+
==Related Pages==
* Adding and removing series
+
For further guidance on Fusion Data Mapper:
* Maintaining the metadata values on series
 
  
==== Add a Mapped Dataset - Procedure ====
+
[[Add a Mapped Dataset – Fusion Data Mapper]]
  
Using the Fusion Data Mapper:
+
[[Add Series to a Dataset – Fusion Data Mapper]]
  
# Choose the Add Dataset function from the left-hand menu bar.
+
[[Browse Privileges – Fusion Data Mapper]]
# Choose a Source Dataset from those available. All Dataflows in the Fusion Registry with a single dimension are shown in this list. However, it is important that the single dimension of the chosen source dataset must be the Series Code. If multiple Source Datasets are shown in the list, care should be taken to choose the correct one otherwise it will be impossible to create the metadata mappings.
 
# Choose the Dataset Definition for the new dataset. A list of available Data Structure Definitions (DSDs) are shown to choose from.
 
#:::The DSD chosen for the new dataset must follow these rules:
 
#:::* the DSD must include a SERIES_CODE dimension
 
#:::* the SERIES_CODE dimension must be coded (conventionally, the Codelist is named CL_SERIES_CODE)
 
#:::* the codes of series in the Time Series Database to be included in the dataset must be ‘registered’ by adding them to the SERIES_CODE Codelist (refer to 4.5 for more on registering series)
 
#:::If an invalid DSD is chosen, the dataset will be created but it will be impossible to add series to it.
 
# Set the name for the new Dataset in the chosen language. This the descriptive name of mapped dataset’s Dataflow, for instance ‘Employment’, ‘National Accounts’ or ‘Financial Activity’. After the Dataset has been created, changes to the name, including adding alternative names in different languages, can be made using the Fusion Registry Administration Interface – Dataflow maintenance. In the example shown in Figure 1, the name has been set in Hebrew.
 
# Set the SDMX ID for the new dataset. The ID is the unique reference for the dataset’s SDMX Dataflow. You must follow these rules when choosing the ID:
 
#:::*The ID must be unique
 
#:::*The ID must use Latin characters and can contain letters, numbers and ‘_’ characters.
 
#::::It cannot contain dots (‘.’) or other special characters such as ‘@’ or ‘$’.
 
#::::The following are valid:
 
#::::EMPLOYMENT
 
#::::FINANCIAL_ACTIVITY2
 
#::::NATIONAL_ACCOUNTS
 
#:::*By convention, IDs are in upper case. For example, use ‘NATIONAL_ACCOUNTS’ rather than ‘National_Accounts’
 
#Set the Version for the dataset. This will be used to set the version of the dataset’s SDMX Dataflow. Version numbers are of the form <major_number>.<minor_number>. The following are valid:
 
#:1.0
 
#:1.1
 
#:2.1
 
#:By convention, the first version is 1.0.
 
#:
 
#:Create new versions of a dataset when you need to change the dimensionality – refer to Section 4.12 Changing the Dimensionality of a Dataset.
 
#Choosing ‘Add’ will create the new dataset which should then appear in the left-hand bar
 
=== Clone a Dataset ===
 
Clone a dataset to create a copy of an existing dataset.
 
  
Use cases:
+
[[Bulk Maintenance of Metadata Values using Excel Import / Export – Fusion Data Mapper]]
* Creating a copy of a dataset with the same dimensionality for experimentation or other purposes
 
* Creating a copy of a dataset with completely different dimensionality
 
* Adding or removing selected dimensions from a dataset
 
  
All of the series in the existing dataset are copied to the clone.
+
[[Changing the Dimensionality of a Dataset – Fusion Data Mapper]]
  
Where a dimension or attribute appears in both the original and clone datasets, the metadata values are copied. However, default values are used where a new dimension or mandatory attribute appears only in the clone dataset. Section 4.9 explains how to define and manage default values.
+
[[Clone a Dataset Fusion Data Mapper]]
===== Clone Dataset - Prerequisites =====
 
#The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface.
 
#The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.
 
  
===== Clone Dataset - Required Roles and Privileges =====
+
[[Codelists - Adding and Removing Codes – Fusion Data Mapper]]
To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place.
 
  
Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include:
+
[[Codelists – Adding and Changing Multilingual Code Names with Impact Analysis - Fusion Data Mapper]]
* Removing the dataset
 
* Adding and removing series
 
* Maintaining the metadata values on series
 
  
===== Clone Dataset Procedure =====
+
[[Content Security Caveats – Fusion Data Mapper]]
The procedure for cloning a dataset is the same as that explained in Section 4.1 on how to add a mapped dataset, with the following exceptions:
 
#Choose the Clone Dataset option
 
#Choose a dataset to clone from – a list of existing datasets in the Structure Set is shown.
 
#Choose the Data Structure (DSD) for the new cloned dataset.
 
In the example shown in Figure 2, a clone is being made of the NATIONAL_ACCOUNTS dataset. A new DSD has been created called NATIONAL_ACCOUNTS Version 2.0 which adds new dimensions.
 
===== Use Case – Add a dimension to a dataset using the Clone Method =====
 
#Using the Fusion Registry Administration Interface, create a new DSD based on the original but including the new dimension. Either save the DSD with a new ID, or use the same ID with a different version number. For instance:
 
#:Original:    NATIONAL_ACCOUNTS version 1.0
 
#:New:          NATIONAL_ACCOUNTS version 2.0
 
#:or:              NEW_NATIONAL_ACCOUNTS version 1.0
 
#Using the Fusion Data Mapper, add a new dataset choosing the Clone Dataset option. Choose the existing dataset as the one to clone, and the newly created DSD. The procedure for this is explained below.
 
#The new dataset will be created by copying all of the series and their metadata values to the cloned dataset. The new dimension will have the default value for every series.
 
#Change the values for the new dimension as required. Section 4.7 explains how to do this interactively using the web user interface. Alternatively, export the mappings to Excel, make the necessary changes and import the results – this is explained in Section 4.8.
 
#Save the mapping for the new dataset.
 
=== Remove a Mapped Dataset ===
 
Mapped datasets created using the procedure describes in section 4.1 can be removed if required.
 
  
Removing a mapped dataset:
+
[[Content Security Metadata Management Use Cases – Fusion Data Mapper]]
  
* Deletes the dataset’s SDMX Dataflow map; and
+
[[Default Code Values Fusion Data Mapper]]
* Deletes the dataset’s SDMX Dataflow and associated Provision Agreements.
 
Use cases:
 
* Removing a redundant dataset one which is no longer required
 
* Removing a dataset that has been created in error
 
The Source Dataset, the observation data, the DSD and the Codelists are not affected.
 
  
:Take care that, if required, the mapping rules (the list of series and the metadata values for each) are saved using the Excel Export function before removing a dataset.
+
[[Maintaining Metadata Values on Series Interactively using the Web Interface – Fusion Data Mapper]]
  
Under certain conditions, datasets removed in error can be restored using the Fusion Registry metadata rollback function. This can only be done by a System Administrator and guidance should be
+
[[Maintenance Privileges – Fusion Data Mapper]]
sought from Metadata Technology Technical Support.
 
  
===== Remove a Mapped Dataset Prerequisites =====
+
[[Registering a Series – Fusion Data Mapper]]
#The mapped dataset has been created using the procedure in Section 4.1.
 
===== Remove a Mapped Dataset - Required Roles and Privileges =====
 
To remove a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set and Dataflow Map for the dataset, or a member of a parent Agency if a hierarchical agency structure is in place.
 
===== Remove a Mapped Dataset – Procedure =====
 
Using the Fusion Data Mapper:
 
#Choose the Dataset
 
#Choose the Delete Dataset option top button bar.
 
#A message box will be displayed asking for confirmation of the deletion. Choose ‘Ok’ to delete, or ‘Cancel’ to abort.
 
#Confirm that the dataset has been removed from the left-hand bar.
 
  
 +
[[Remove a Mapped Dataset – Fusion Data Mapper]]
  
 +
[[Removing Series from a Dataset – Fusion Data Mapper]]
  
=== Add Series to a Dataset ===
+
[[The Fusion Data Mapper User Interface]]
The process of adding a series to a dataset results in a mapping rule being added to the dataset’s
 
SDMX Dataflow Map.
 
The mapping rule is described using an SDMX Value Map which, for each series in the mapped dataset,
 
translates the Series Code of a series in the Time Series Database to a specified value for each of the
 
dimensions in the dataset’s SDMX DSD.
 
For instance, a Value Map for the DEP_Q_N ‘Depreciation’ series may conceptually look like the
 
following:
 
Source dataset from Time
 
Series Database
 
Mapped dataset
 
Dimension Code
 
Maps to >
 
Dimension Code Code Name
 
SERIES_CODE “DEP_Q_N”
 
SERIES_CODE DEP_Q_N “Depreciation”
 
DATA_TYPE BAL “Balance”
 
SERIES_TYPE O “Original”
 
FREQ Q “Quarterly”
 
UNITS USD “US Dollars”
 
PRICE_BASE FIX “Fixed Prices”
 
ECONOMIC_AREA FA “Financial Activity”
 
SUBJECT NA “National Accounts”
 
DATA_SOURCE CB “Central Bank”
 
CALCULATION “GDP.Q_N - GDP_PF_TAB_NET.Q_N -
 
DEP.Q_N - TAX_NET_PROD.Q_N –
 
M_TAX_NET.Q_N”
 
The process of creating and maintaining the SDMX Value Map is managed by the Fusion Data Mapper.
 
If necessary, the SDMX Dataflow Map and its constituent Value Maps can be examined and
 
manipulated using the Fusion Registry Administration Interface, or the REST API. These procedures
 
are outside of the scope of this document.
 
Add Series – Prerequisites
 
1. The mapped dataset has been created using the procedure described in Section 4.1.
 
2. The series must have been registered with Fusion Registry. This means that the series code for the
 
series to be added to the dataset must appear in the Codelist for the SERIES_CODE dimension in
 
the DSD of the Mapped Dataset. The process for registering a series is described in Section 4.5.
 
Note that series can be registered and added to a dataset without the observation data
 
existing in the Time Series Database. End users querying for the series will see no results until
 
the observation data is added.
 
Add Series – Required Roles and Privileges
 
To add series to a dataset, users must be a member of the Agency that owns the SDMX Structure Set
 
and Dataflow Map that represents the dataset, or a member of a parent Agency if a hierarchical
 
agency structure is in place.
 
Add Series – Procedure
 
14
 
There are two different methods for adding series to a dataset:
 
• Interactively using the web user interface
 
• Excel
 
Interactive Method
 
1. Choose the Dataset.
 
2. Choose the Add Series button.
 
3. The Add Series window is displayed
 
Figure 3 Add series selection window
 
The window displays a list of series known to Fusion Registry. Those with check-boxes can be
 
selected and added to the dataset in one operation to the dataset using the Add button. The
 
default value will be used for each dimension.
 
Series highlighted in red are in the Time Series Database (Source Dataset) but cannot be
 
mapped into this dataset because they have not been ‘registered’. See Section 4.1 for
 
guidance on registering series.
 
Series highlighted in green are already mapped into the current dataset.
 
The Filter Codes allows multilingual free-text search of the series codes and series names as
 
plain text or by providing a regular expression. Any valid regex search pattern can be used and
 
does not need to begin with a ^ or / control character. Example: MKR.*DEF[ABC]+
 
15
 
Three different views are available:
 
All Allregistered and un-registered series including those that have
 
already been mapped into this dataset.
 
The All view can be used to identify whether a series is known
 
to the system, and what its status is.
 
Series listed in this view cannot be mapped if they are
 
unregistered or have already been mapped into this dataset.
 
Not mapped in any dataset Registered and un-registered series that have not been mapped
 
into any dataset.
 
This is the default view.
 
Not mapped in this dataset Registered and un-registered series that have not been mapped
 
in this dataset. Series can be mapped into multiple datasets, so
 
series that have already been mapped in other datasets will
 
appear in this view.
 
Excel Method
 
Series can be added in bulk to a Mapped Dataset by importing from an Excel spreadsheet. Note that
 
the spreadsheet must follow a specific layout, as illustrated in Figure 4.
 
Figure 4 Example spreadsheet for adding series
 
The following rules must be followed when building Excel workbooks for importing series
 
mappings:
 
• The workbook must contain a single worksheet. The name of the worksheet is not significant.
 
• The first row (Row A) must contain a header with the IDs of the Dimensions and Attributes
 
exactly as defined by the dataset’s DSD.
 
• The second and subsequent rows must contain the series to add to the dataset – one series
 
per row. There can be no blank rows.
 
• Each series in the sheet must have a valid series code in the SERIES_CODE dimension. To be
 
valid, the series must have been registered (see Section 4.5 for guidance on registering a
 
16
 
series). However, the observation data for a series does not need to be in the Time Series
 
Database.
 
Blank Cells and Default Value Behaviour
 
The default value will be used for coded dimensions where the cell is left blank (see Section 4.9
 
for more information on default values). In the example shown in Figure 4, leaving the UNITS cell
 
blank for a series would result in the default ‘NIS’ value.
 
For un-coded attributes (CALCULATION_FORMULA in the example), no default will be used an
 
empty cell will result in a blank value in mapped series output.
 
The default value behaviour allows a list of series to be added to the dataset, just by specifying
 
the series codes in the spreadsheet. Once added, the metadata values can be set interactively for
 
each series using the web interface. Figure 5 illustrates an Excel spreadsheet to add 10 series, all
 
with default values.
 

Latest revision as of 05:41, 11 September 2023

Overview – Fusion Data Mapper

This document provides guidance and operating procedures for creating and managing mapped datasets using Fusion Registry 10 and the Fusion Data Mapper.

Use Case

The primary use case is transforming single dimensional datasets to SDMX multi-dimensional structures.

Single dimensional datasets are those with a single unique identifier for each series (e.g. Series Code) such as created by FAME or similar time-series production systems.

One-to-one transformations only are supported by this version of the Fusion Data Mapper.

The transformation is performed by Fusion Registry using SDMX Structure Mapping. Fusion Data Mapper provides an easy-to-use user interface for defining and management the mapping rules.

Audience

  • Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
  • Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
  • System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.

Prerequisites

Readers are assumed to have an understanding of basic SDMX principles and the purpose of the main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.

Terminology

Dataset   Dataset refers to a named collection of series that typically all fall under a specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX Dataflow represents a dataset.


Mapped Dataset   A Mapped Dataset is an SDMX Dataflow where data is taken from a ‘source’ Dataflow and transformed to different dimensionality using defined mapping rules. The Fusion Data Mapper manages these mapping rules.

 In this document, the source Dataflow is assumed to be observation data from the Time Series Database which is described by a Data Structure Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE dimensions.


Time Series Database   The source of time series observation data without metadata that Fusion Registry maps to Mapped Datasets using the defined mapping rules.

Related Pages

For further guidance on Fusion Data Mapper:

Add a Mapped Dataset – Fusion Data Mapper

Add Series to a Dataset – Fusion Data Mapper

Browse Privileges – Fusion Data Mapper

Bulk Maintenance of Metadata Values using Excel Import / Export – Fusion Data Mapper

Changing the Dimensionality of a Dataset – Fusion Data Mapper

Clone a Dataset – Fusion Data Mapper

Codelists - Adding and Removing Codes – Fusion Data Mapper

Codelists – Adding and Changing Multilingual Code Names with Impact Analysis - Fusion Data Mapper

Content Security Caveats – Fusion Data Mapper

Content Security Metadata Management Use Cases – Fusion Data Mapper

Default Code Values – Fusion Data Mapper

Maintaining Metadata Values on Series Interactively using the Web Interface – Fusion Data Mapper

Maintenance Privileges – Fusion Data Mapper

Registering a Series – Fusion Data Mapper

Remove a Mapped Dataset – Fusion Data Mapper

Removing Series from a Dataset – Fusion Data Mapper

The Fusion Data Mapper User Interface