Difference between revisions of "Fusion Data Mapper"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Clone Dataset – Procedure)
 
(161 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Overview ==
+
[[Category:Fusion Data Mapper]]
 +
== Overview – Fusion Data Mapper==
 
This document provides guidance and operating procedures for creating and managing mapped
 
This document provides guidance and operating procedures for creating and managing mapped
datasets using Fusion Registry 9 and the Fusion Data Mapper.
+
datasets using Fusion Registry 10 and the Fusion Data Mapper.
 +
 
 +
'''Use Case'''
  
==== Use Case ====
 
 
The primary use case is transforming single dimensional datasets to SDMX multi-dimensional
 
The primary use case is transforming single dimensional datasets to SDMX multi-dimensional
 
structures.
 
structures.
Line 15: Line 17:
 
Mapper provides an easy-to-use user interface for defining and management the mapping rules.
 
Mapper provides an easy-to-use user interface for defining and management the mapping rules.
  
==== Audience ====
+
'''Audience'''
 
* Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
 
* Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
 
* Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
 
* Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
 
* System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.
 
* System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.
==== Prerequisites ====
+
 
 +
'''Prerequisites'''
 +
 
 
Readers are assumed to have an understanding of basic SDMX principles and the purpose of the
 
Readers are assumed to have an understanding of basic SDMX principles and the purpose of the
 
main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data
 
main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data
 
Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.
 
Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.
  
==== Terminology ====
+
'''Terminology'''
Dataset Dataset           
 
 
 
refers to a named collection of series that typically all fall under a
 
specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX
 
Dataflow represents a dataset.
 
Mapped Dataset A Mapped Dataset is an SDMX Dataflow where data is taken from a
 
‘source’ Dataflow and transformed to different dimensionality using
 
defined mapping rules. The Fusion Data Mapper manages these mapping
 
rules.
 
In this document, the source Dataflow is assumed to be observation data
 
from the Time Series Database which is described by a Data Structure
 
Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE
 
dimensions.
 
Time Series Database The source of time series observation data without metadata that Fusion
 
Registry maps to Mapped Datasets using the defined mapping rules.
 
 
 
== The Fusion Data Mapper User Interface ==
 
The Fusion Data Mapper is a web user interface providing the following main functions:
 
 
 
'''Authenticated users with sufficient structural metadata maintenance privileges'''
 
 
 
* Add and remove mapped datasets
 
* Add and remove series on mapped datasets
 
* Interactively set and change the metadata values on a series by series basis
 
* Export metadata values for selected series to Excel
 
* Import metadata values for defined series from Excel
 
* Change code names with impact analysis
 
 
 
'''Anonymous or authenticated users with sufficient privileges to view but not change the structural metadata'''
 
 
 
* Browse the catalogue of mapped datasets
 
* Examine the ‘definition’ of a dataset – its dimensionality and list of possible codes for each
 
* Dimension or Attribute
 
* Browse the series in each dataset
 
 
 
== The Fusion Registry Administration Interface ==
 
 
 
The Administration Interface is Fusion Registry’s main web user interface.
 
  
For the purposes of managing the metadata on mapped datasets, it provides the following functions:
+
{|
 +
|-style="vertical-align: top;"
 +
| Dataset ||  Dataset refers to a named collection of series that typically all fall under a specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX Dataflow represents a dataset.
  
'''Authenticated users with sufficient structural metadata management privileges'''
 
  
* Create and modify SDMX Data Structure Definitions (DSDs)
 
* Create and modify SDMX Concepts
 
* Create and modify SDMX Codelists
 
* Add and remove codes from SDMX Codelists
 
* Register a series (series must be ‘registered’ before they can be mapped in dataset by adding the Series Code and Series Name to the relevant SERIES_CODE Codelist)
 
  
Refer to the ''Fusion Registry Structural Metadata Management Guide'' for general information on using the Fusion Registry Administration Interface for creating and maintaining core SDMX structure
+
|-style="vertical-align: top;"
metadata artefacts including DSDs, Dataflows, Concepts, Categories and Codelists.
+
| Mapped Dataset||   A Mapped Dataset is an SDMX Dataflow where data is taken from a ‘source’ Dataflow and transformed to different dimensionality using defined mapping rules. The Fusion Data Mapper manages these mapping rules.
 +
 In this document, the source Dataflow is assumed to be observation data from the Time Series Database which is described by a Data Structure Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE dimensions.
  
== Operating Procedures ==
 
  
=== Add a Mapped Dataset ===
+
|-style="vertical-align: top;"
 +
| Time Series Database||   The source of time series observation data without metadata that Fusion Registry maps to Mapped Datasets using the defined mapping rules.
 +
|}
  
A mapped dataset is an SDMX Dataflow and an associated SDMX Dataflow Map that describes:
+
==Related Pages==
:::(a) The dataset’s dimensionality using an SDMX Data Structure Definition (DSD)
+
For further guidance on Fusion Data Mapper:
:::(b) The list of series in the dataset
 
:::(c) The metadata values for each series
 
  
Use cases:
+
[[Add a Mapped Dataset – Fusion Data Mapper]]
* Creating a new dataset
 
* Creating an alternative version of an existing dataset perhaps with a different compliment of series and / or dimensionality
 
* Creating an alternative version of a dataset with simplified dimensionality for public dissemination
 
  
The Fusion Data Mapper provides a convenient way to interactively manage the process. However, it
+
[[Add Series to a Dataset – Fusion Data Mapper]]
is important to note that creating, modifying and examining the underlying SDMX artefacts can also
 
be done using the Fusion Registry Administration Interface or the REST API which may be useful for
 
debugging purposes. Discussion of these topics is outside of the scope of this document.
 
  
==== Add a Mapped Dataset - Prerequisites ====
+
[[Browse Privileges – Fusion Data Mapper]]
  
# The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface.
+
[[Bulk Maintenance of Metadata Values using Excel Import / Export – Fusion Data Mapper]]
# The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.
 
  
==== Add a Mapped Dataset - Required Roles and Privileges ====
+
[[Changing the Dimensionality of a Dataset – Fusion Data Mapper]]
  
To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place.
+
[[Clone a Dataset – Fusion Data Mapper]]
  
Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include:
+
[[Codelists - Adding and Removing Codes – Fusion Data Mapper]]
  
* Removing the dataset
+
[[Codelists – Adding and Changing Multilingual Code Names with Impact Analysis - Fusion Data Mapper]]
* Adding and removing series
 
* Maintaining the metadata values on series
 
  
==== Add a Mapped Dataset - Procedure ====
+
[[Content Security Caveats – Fusion Data Mapper]]
  
Using the Fusion Data Mapper:
+
[[Content Security Metadata Management Use Cases – Fusion Data Mapper]]
  
# Choose the Add Dataset function from the left-hand menu bar.
+
[[Default Code Values – Fusion Data Mapper]]
# Choose a Source Dataset from those available. All Dataflows in the Fusion Registry with a single dimension are shown in this list. However, it is important that the single dimension of the chosen source dataset must be the Series Code. If multiple Source Datasets are shown in the list, care should be taken to choose the correct one otherwise it will be impossible to create the metadata mappings.
 
# Choose the Dataset Definition for the new dataset. A list of available Data Structure Definitions (DSDs) are shown to choose from.
 
#:::The DSD chosen for the new dataset must follow these rules:
 
#:::* the DSD must include a SERIES_CODE dimension
 
#:::* the SERIES_CODE dimension must be coded (conventionally, the Codelist is named CL_SERIES_CODE)
 
#:::* the codes of series in the Time Series Database to be included in the dataset must be ‘registered’ by adding them to the SERIES_CODE Codelist (refer to 4.5 for more on registering series)
 
#:::If an invalid DSD is chosen, the dataset will be created but it will be impossible to add series to it.
 
# Set the name for the new Dataset in the chosen language. This the descriptive name of mapped dataset’s Dataflow, for instance ‘Employment’, ‘National Accounts’ or ‘Financial Activity’. After the Dataset has been created, changes to the name, including adding alternative names in different languages, can be made using the Fusion Registry Administration Interface – Dataflow maintenance. In the example shown in Figure 1, the name has been set in Hebrew.
 
# Set the SDMX ID for the new dataset. The ID is the unique reference for the dataset’s SDMX Dataflow. You must follow these rules when choosing the ID:
 
#:::*The ID must be unique
 
#:::*The ID must use Latin characters and can contain letters, numbers and ‘_’ characters.
 
#::::It cannot contain dots (‘.’) or other special characters such as ‘@’ or ‘$’.
 
#::::The following are valid:
 
#::::EMPLOYMENT
 
#::::FINANCIAL_ACTIVITY2
 
#::::NATIONAL_ACCOUNTS
 
#:::*By convention, IDs are in upper case. For example, use ‘NATIONAL_ACCOUNTS’ rather than ‘National_Accounts’
 
#Set the Version for the dataset. This will be used to set the version of the dataset’s SDMX Dataflow. Version numbers are of the form <major_number>.<minor_number>. The following are valid:
 
#:1.0
 
#:1.1
 
#:2.1
 
#:By convention, the first version is 1.0.
 
#:
 
#:Create new versions of a dataset when you need to change the dimensionality – refer to Section 4.12 Changing the Dimensionality of a Dataset.
 
#Choosing ‘Add’ will create the new dataset which should then appear in the left-hand bar
 
=== Clone a Dataset ===
 
Clone a dataset to create a copy of an existing dataset.
 
  
Use cases:
+
[[Maintaining Metadata Values on Series Interactively using the Web Interface – Fusion Data Mapper]]
* Creating a copy of a dataset with the same dimensionality for experimentation or other purposes
 
* Creating a copy of a dataset with completely different dimensionality
 
* Adding or removing selected dimensions from a dataset
 
  
All of the series in the existing dataset are copied to the clone.
+
[[Maintenance Privileges – Fusion Data Mapper]]
  
Where a dimension or attribute appears in both the original and clone datasets, the metadata values are copied. However, default values are used where a new dimension or mandatory attribute appears only in the clone dataset. Section 4.9 explains how to define and manage default values.
+
[[Registering a Series – Fusion Data Mapper]]
===== Clone Dataset - Prerequisites =====
 
#The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface.
 
#The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.
 
  
===== Clone Dataset - Required Roles and Privileges =====
+
[[Remove a Mapped Dataset – Fusion Data Mapper]]
To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place.
 
  
Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include:
+
[[Removing Series from a Dataset – Fusion Data Mapper]]
* Removing the dataset
 
* Adding and removing series
 
* Maintaining the metadata values on series
 
  
===== Clone Dataset – Procedure =====
+
[[The Fusion Data Mapper User Interface]]
The procedure for cloning a dataset is the same as that explained in Section 4.1 on how to add a mapped dataset, with the following exceptions:
 

Latest revision as of 06:41, 11 September 2023

Overview – Fusion Data Mapper

This document provides guidance and operating procedures for creating and managing mapped datasets using Fusion Registry 10 and the Fusion Data Mapper.

Use Case

The primary use case is transforming single dimensional datasets to SDMX multi-dimensional structures.

Single dimensional datasets are those with a single unique identifier for each series (e.g. Series Code) such as created by FAME or similar time-series production systems.

One-to-one transformations only are supported by this version of the Fusion Data Mapper.

The transformation is performed by Fusion Registry using SDMX Structure Mapping. Fusion Data Mapper provides an easy-to-use user interface for defining and management the mapping rules.

Audience

  • Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
  • Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
  • System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.

Prerequisites

Readers are assumed to have an understanding of basic SDMX principles and the purpose of the main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.

Terminology

Dataset   Dataset refers to a named collection of series that typically all fall under a specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX Dataflow represents a dataset.


Mapped Dataset   A Mapped Dataset is an SDMX Dataflow where data is taken from a ‘source’ Dataflow and transformed to different dimensionality using defined mapping rules. The Fusion Data Mapper manages these mapping rules.

 In this document, the source Dataflow is assumed to be observation data from the Time Series Database which is described by a Data Structure Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE dimensions.


Time Series Database   The source of time series observation data without metadata that Fusion Registry maps to Mapped Datasets using the defined mapping rules.

Related Pages

For further guidance on Fusion Data Mapper:

Add a Mapped Dataset – Fusion Data Mapper

Add Series to a Dataset – Fusion Data Mapper

Browse Privileges – Fusion Data Mapper

Bulk Maintenance of Metadata Values using Excel Import / Export – Fusion Data Mapper

Changing the Dimensionality of a Dataset – Fusion Data Mapper

Clone a Dataset – Fusion Data Mapper

Codelists - Adding and Removing Codes – Fusion Data Mapper

Codelists – Adding and Changing Multilingual Code Names with Impact Analysis - Fusion Data Mapper

Content Security Caveats – Fusion Data Mapper

Content Security Metadata Management Use Cases – Fusion Data Mapper

Default Code Values – Fusion Data Mapper

Maintaining Metadata Values on Series Interactively using the Web Interface – Fusion Data Mapper

Maintenance Privileges – Fusion Data Mapper

Registering a Series – Fusion Data Mapper

Remove a Mapped Dataset – Fusion Data Mapper

Removing Series from a Dataset – Fusion Data Mapper

The Fusion Data Mapper User Interface