Data Structure Definition (DSD) - Structural Metadata Management

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

A Data Structure Definition (DSD) defines a dataset in terms of its Dimensionality and allowable content. All reported datasets must conform to the definition defined by the DSD.

A DSD Consists of Dimensions, Attributes, and Measures, collectively these are termed Components. Each Component of a DSD references a Concept to provide a semantic meaning, and optionally a Codelist to provide an enumerated list of allowable content for reported data.

A DSD can include a special type of Dimension known as a Measure Dimension. The Measure Dimension supports the use case of multiple measures. Unlike a ‘normal’ Dimension, the Measure Dimension cannot reference a Codelist, however it must reference a Concept Scheme which is used to list the allowable measures. Each Concept in the Concept Scheme may provide its own representation which can be enumerated (Codelist) or non-enumerated. When data is reported for the Measure Dimension, the allowable values will depend on the Concept that is being reported in the linked Concept Scheme.


DSD 1.jpg


When Viewing a DSD the Components of the DSD are displayed by showing the Name of the Concept as the Component label, and the allowable content, which may be the referenced Codelist, as shown below.


DSD Fig 1.jpg
Figure 1 showing a DSD for World Development Indicators


Data Structure Wizard

The Data Structure Wizard includes the first generic step for information about the DSD. In addition, the first step asks two further questions: is the DSD describing Time Series data, if so a Time Dimension is automatically created and added and if there is a TIME_PERIOD Concept in the Fusion Registry this will be used to provide the semantic for this Dimension; and secondly what is the Concept used for the Primary Measure, if there is an OBS_STATUS Concept in the Fusion Registry this will be used by default. To change the default Concepts chosen by the wizard for either the Time Dimension or the Primary Measure, click on the text field to open up a list of all available Concepts.

The second step of the Wizard allows the user to define all the other Concepts which will be used by the DSD. Each Concept can be assigned a role of Dimension, Measure Dimension, or Attribute.


DSD Fig 2.jpg
Figure 2 showing the second step of the DSD Wizard

The third step of the wizard allows the user to define the allowable content for each Component. If the Concept has a default Representation this will be selected by default, however this Representation can be changed if required.

DSD Fig 3.jpg
Figure 3 showing the third step of the DSD Wizard

The allowable content can be enumerated (which is a reference to a Codelist) or non-enumerated (for example Text, Integer, Double, Boolean etc). The allowable content defines what data a user can report when they are supplying a dataset.

The final step of the Wizard is to define the assignment status (required or optional) and attachment level (dataset, series, dimension group, observation, group) of Attributes as shown in the screenshot below.

DSD Fig 4.jpg
Figure 4 showing the attached level for a number of Attribute for a DSD


The purpose of each attachment levels is described below.

Dataset Attachment

An Attribute attaching to a dataset will mean that when data is reported for the dataset, there will be a single value which is provided for the dataset. For example, the Unit of Measure Attribute could be attached to the dataset if it is expected that all of the observations will always be measured using the same unit.

Series Attachment

If the Attribute attaches to a Series, then the Attribute will attach itself to every Dimension in the Data Structure Definition, except the Time Dimension. This attachment is used to define that the value for the Attribute can vary for each Series in the dataset. For example, the Series Title Attribute could attach to a Series, as each series will have a different title.

It is possible to modify this attachment to specify a subset of the Dimensions, if the value for the attribute only relates to a subset.

Observation Attachment

If a value for an Attribute can change from Observation to Observation, then the attachment level should be set to Observation. An example of an Observation attachment is Observation Confidentiality.

Group Attachment

The final attachment is to a pre-defined Group of Dimensions. This is similar to Series Attachment, in that the Attribute will attach to a subset of Dimensions, however the subset is defined by the referenced Group.

This option is only available in the list if a Group has been defined. To define, modify, or delete a Group, click on the ‘Manage Groups’ button, which will open a window, as shown below.