Fusion Data Mapper

From Fusion Registry Wiki
Revision as of 02:15, 3 September 2019 by Mbaird (talk | contribs) (Clone a Dataset)
Jump to navigation Jump to search

Overview

This document provides guidance and operating procedures for creating and managing mapped datasets using Fusion Registry 9 and the Fusion Data Mapper.

Use Case

The primary use case is transforming single dimensional datasets to SDMX multi-dimensional structures.

Single dimensional datasets are those with a single unique identifier for each series (e.g. Series Code) such as created by FAME or similar time-series production systems.

One-to-one transformations only are supported by this version of the Fusion Data Mapper.

The transformation is performed by Fusion Registry using SDMX Structure Mapping. Fusion Data Mapper provides an easy-to-use user interface for defining and management the mapping rules.

Audience

  • Metadata Managers – those responsible for managing the metadata mappings on the Bank’s catalogue of time series on a day to day basis.
  • Metadata Superusers – those responsible for managing the core structural metadata including Agencies, Concepts, Data Structure Definitions and Codelists.
  • System Administrators – those responsible for administering Fusion Registry 9 as part of the integrated statistical data and metadata system, and managing the Time Series Database as the source of observation data.

Prerequisites

Readers are assumed to have an understanding of basic SDMX principles and the purpose of the main SDMX structural metadata artefacts including Concepts, Codes and Codelists, Categories, Data Structure Definitions (DSDs), Dataflows, Provision Agreements, Structure Sets and Dataflow Maps.

Terminology

Dataset Dataset

refers to a named collection of series that typically all fall under a specific topic, for instance ‘National Accounts’. In Fusion Registry, an SDMX Dataflow represents a dataset. Mapped Dataset A Mapped Dataset is an SDMX Dataflow where data is taken from a ‘source’ Dataflow and transformed to different dimensionality using defined mapping rules. The Fusion Data Mapper manages these mapping rules. In this document, the source Dataflow is assumed to be observation data from the Time Series Database which is described by a Data Structure Definition having only SERIES_CODE, TIME_PERIOD and OBS_VALUE dimensions. Time Series Database The source of time series observation data without metadata that Fusion Registry maps to Mapped Datasets using the defined mapping rules.

The Fusion Data Mapper User Interface

The Fusion Data Mapper is a web user interface providing the following main functions:

Authenticated users with sufficient structural metadata maintenance privileges

  • Add and remove mapped datasets
  • Add and remove series on mapped datasets
  • Interactively set and change the metadata values on a series by series basis
  • Export metadata values for selected series to Excel
  • Import metadata values for defined series from Excel
  • Change code names with impact analysis

Anonymous or authenticated users with sufficient privileges to view but not change the structural metadata

  • Browse the catalogue of mapped datasets
  • Examine the ‘definition’ of a dataset – its dimensionality and list of possible codes for each
  • Dimension or Attribute
  • Browse the series in each dataset

The Fusion Registry Administration Interface

The Administration Interface is Fusion Registry’s main web user interface.

For the purposes of managing the metadata on mapped datasets, it provides the following functions:

Authenticated users with sufficient structural metadata management privileges

  • Create and modify SDMX Data Structure Definitions (DSDs)
  • Create and modify SDMX Concepts
  • Create and modify SDMX Codelists
  • Add and remove codes from SDMX Codelists
  • Register a series (series must be ‘registered’ before they can be mapped in dataset by adding the Series Code and Series Name to the relevant SERIES_CODE Codelist)

Refer to the Fusion Registry Structural Metadata Management Guide for general information on using the Fusion Registry Administration Interface for creating and maintaining core SDMX structure metadata artefacts including DSDs, Dataflows, Concepts, Categories and Codelists.

Operating Procedures

Add a Mapped Dataset

A mapped dataset is an SDMX Dataflow and an associated SDMX Dataflow Map that describes:

(a) The dataset’s dimensionality using an SDMX Data Structure Definition (DSD)
(b) The list of series in the dataset
(c) The metadata values for each series

Use cases:

  • Creating a new dataset
  • Creating an alternative version of an existing dataset perhaps with a different compliment of series and / or dimensionality
  • Creating an alternative version of a dataset with simplified dimensionality for public dissemination

The Fusion Data Mapper provides a convenient way to interactively manage the process. However, it is important to note that creating, modifying and examining the underlying SDMX artefacts can also be done using the Fusion Registry Administration Interface or the REST API which may be useful for debugging purposes. Discussion of these topics is outside of the scope of this document.

Add a Mapped Dataset - Prerequisites

  1. The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface.
  2. The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.

Add a Mapped Dataset - Required Roles and Privileges

To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place.

Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include:

  • Removing the dataset
  • Adding and removing series
  • Maintaining the metadata values on series

Add a Mapped Dataset - Procedure

Using the Fusion Data Mapper:

  1. Choose the Add Dataset function from the left-hand menu bar.
  2. Choose a Source Dataset from those available. All Dataflows in the Fusion Registry with a single dimension are shown in this list. However, it is important that the single dimension of the chosen source dataset must be the Series Code. If multiple Source Datasets are shown in the list, care should be taken to choose the correct one otherwise it will be impossible to create the metadata mappings.
  3. Choose the Dataset Definition for the new dataset. A list of available Data Structure Definitions (DSDs) are shown to choose from.
    The DSD chosen for the new dataset must follow these rules:
    • the DSD must include a SERIES_CODE dimension
    • the SERIES_CODE dimension must be coded (conventionally, the Codelist is named CL_SERIES_CODE)
    • the codes of series in the Time Series Database to be included in the dataset must be ‘registered’ by adding them to the SERIES_CODE Codelist (refer to 4.5 for more on registering series)
    If an invalid DSD is chosen, the dataset will be created but it will be impossible to add series to it.
  4. Set the name for the new Dataset in the chosen language. This the descriptive name of mapped dataset’s Dataflow, for instance ‘Employment’, ‘National Accounts’ or ‘Financial Activity’. After the Dataset has been created, changes to the name, including adding alternative names in different languages, can be made using the Fusion Registry Administration Interface – Dataflow maintenance. In the example shown in Figure 1, the name has been set in Hebrew.
  5. Set the SDMX ID for the new dataset. The ID is the unique reference for the dataset’s SDMX Dataflow. You must follow these rules when choosing the ID:
    • The ID must be unique
    • The ID must use Latin characters and can contain letters, numbers and ‘_’ characters.
    It cannot contain dots (‘.’) or other special characters such as ‘@’ or ‘$’.
    The following are valid:
    EMPLOYMENT
    FINANCIAL_ACTIVITY2
    NATIONAL_ACCOUNTS
    • By convention, IDs are in upper case. For example, use ‘NATIONAL_ACCOUNTS’ rather than ‘National_Accounts’
  6. Set the Version for the dataset. This will be used to set the version of the dataset’s SDMX Dataflow. Version numbers are of the form <major_number>.<minor_number>. The following are valid:
    1.0
    1.1
    2.1
    By convention, the first version is 1.0.
    Create new versions of a dataset when you need to change the dimensionality – refer to Section 4.12 Changing the Dimensionality of a Dataset.
  7. Choosing ‘Add’ will create the new dataset which should then appear in the left-hand bar

Clone a Dataset

Clone a dataset to create a copy of an existing dataset.

Use cases:

  • Creating a copy of a dataset with the same dimensionality for experimentation or other purposes
  • Creating a copy of a dataset with completely different dimensionality
  • Adding or removing selected dimensions from a dataset

All of the series in the existing dataset are copied to the clone.

Where a dimension or attribute appears in both the original and clone datasets, the metadata values are copied. However, default values are used where a new dimension or mandatory attribute appears only in the clone dataset. Section 4.9 explains how to define and manage default values.

Clone Dataset - Prerequisites

1. The DSD that you plan to use for the dataset must already exist. DSDs and their associated structures can be created and managed using the Fusion Registry Administration User Interface. 2. The Source Dataset that contains the unmapped time series observations. The Source Dataset is an SDMX Dataflow created by a System Administrator that provides access to the Time Series Database observation data.

Clone Dataset - Required Roles and Privileges

To add a mapped dataset, the user must be a member of the Agency that owns the SDMX Structure Set, or a member of a parent Agency if a hierarchical agency structure is in place. Once created, the SDMX Dataflow Map which represents the dataset will be owned by the same Agency as the SDMX Structure Set to which it belongs. Any subsequent changes to the dataset can only be performed by users who are a member of that Agency. Changes include: • Removing the dataset • Adding and removing series • Maintaining the metadata values on series

Clone Dataset – Procedure

The procedure for cloning a dataset is the same as that explained in Section 4.1 on how to add a mapped dataset, with the following exceptions: