Difference between revisions of "Data Structure Definition"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Data Structure Components)
(Data Structure Components)
Line 42: Line 42:
 
<strong>The Role of Concepts In Defining a DSD's Components</strong><br>
 
<strong>The Role of Concepts In Defining a DSD's Components</strong><br>
 
Every Dimension, Attribute and Measure is described by a predefined [[Concept]]. Concepts have their own default [[Representation]] which can be overridden by defining a [[Local Representation]] for the Component in the DSD. That's particularly helpful when using some standard Concepts like the [https://registry.sdmx.org/ws/public/sdmxapi/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/2.0 SDMX Cross Domain Concepts] where the default Representation is 'String', but the Component needs to be [[Enumerated]] or have some use case specific restriction on what values are allowable.  
 
Every Dimension, Attribute and Measure is described by a predefined [[Concept]]. Concepts have their own default [[Representation]] which can be overridden by defining a [[Local Representation]] for the Component in the DSD. That's particularly helpful when using some standard Concepts like the [https://registry.sdmx.org/ws/public/sdmxapi/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/2.0 SDMX Cross Domain Concepts] where the default Representation is 'String', but the Component needs to be [[Enumerated]] or have some use case specific restriction on what values are allowable.  
 +
</p>
 +
 +
<p>
 +
<strong>Dimensions</strong><br>
 +
A DSDs Dimensions are the minimal set of statistical concepts capable of uniquely identifying a specific series, and in combination with the Time Dimension, uniquely identifying an Observation.<br>
 +
In this sense, the Dimensions of a dataset together form its primary key.
 
</p>
 
</p>
  
Line 78: Line 84:
 
<p>
 
<p>
 
<strong>Time Dimension</strong>
 
<strong>Time Dimension</strong>
A Time Dimension is required for DSDs representing [[Time Series]] datasets. Again, the Time Dimension must reference a Concept which should have a representation capable of holding a time value - typically [[Observational Time Period]].
+
A Time Dimension is required for DSDs representing [[Time Series]] datasets. Again, the Time Dimension must reference a Concept which should have an appropriate time representation - typically [[Observational Time Period]].
 
</p>
 
</p>

Revision as of 05:29, 20 December 2019

Overview

An SDMX Data Structure Definition (DSD) describes the structure and dimensionality of a dataset in terms of its dimensions, attributes and measures.

Structure Properties

Structure Type Standard SDMX Structural Metadata Artefact
Maintainable Yes
Identifiable Yes
Item Scheme No
SDMX Information Model Versions 1.0, 2.0, 2.1
Concept ID DSD

Context within the SDMX 2.1 Information Model

SDMX Information Model - Core Artefacts - DSD.png

The schematic illustrates the Data Structure Definition artefact within the SDMX 2.1 Information Model

Usage

Data Structure Definitions (DSDs) are used to describe the structure of datasets by specifying their constituent Components:

and optionally the Representation for each Component.

Each Dataflow references a single DSD which describes the structure of the dataset that the Dataflow represents.

Data Structure Components

The Role of Concepts In Defining a DSD's Components
Every Dimension, Attribute and Measure is described by a predefined Concept. Concepts have their own default Representation which can be overridden by defining a Local Representation for the Component in the DSD. That's particularly helpful when using some standard Concepts like the SDMX Cross Domain Concepts where the default Representation is 'String', but the Component needs to be Enumerated or have some use case specific restriction on what values are allowable.

Dimensions
A DSDs Dimensions are the minimal set of statistical concepts capable of uniquely identifying a specific series, and in combination with the Time Dimension, uniquely identifying an Observation.
In this sense, the Dimensions of a dataset together form its primary key.

Ordering of Dimensions in a DSD
The Dimensions in a DSD have a defined order and together form the dataset's Series Key.
To illustrate the principle, below is a simple example DSD:

Position Component Type Component ID Description
1 Dimension INDICATOR Indicator
2 Dimension REF_AREA Reference Area
3 Dimension FREQUENCY Data Frequency
n/a Time Dimension TIME_PERIOD Observation Time
n/a Attribute UNIT_MULT Unit Multiplier e.g. tens, thousands, millions
n/a Attribute Observation Status Observation Status e.g. Estimated, Final
n/a Primary Measure Observation Value The observation value

The Series Key is the concatenation of the Dimensions in the order specified in the DSD. So for this example, the Series Key is:
INDICATOR.REF_AREA.FREQUENCY
Attributes do not form part of the Series Key so have no explicit or implied ordering.

Primary Measure
All DSDs must have a Primary Measure Component, which is used for the observation value of the main variable being measured. Like all components, the Primary Measure must reference a Concept. For many series, the measure is numeric, but does not need to be so.

Time Dimension A Time Dimension is required for DSDs representing Time Series datasets. Again, the Time Dimension must reference a Concept which should have an appropriate time representation - typically Observational Time Period.