Difference between revisions of "Codelist"

From Fusion Registry Wiki
Jump to navigation Jump to search
m (Flat Codelists)
(Structure Properties)
 
(21 intermediate revisions by one other user not shown)
Line 4: Line 4:
 
An SDMX Codelist is a managed list of classification codes.
 
An SDMX Codelist is a managed list of classification codes.
  
 +
==Structure Properties==
 
{| class="wikitable"
 
{| class="wikitable"
| Structure Type|| Standard SDMX Structural Metadata Artefact
+
! scope=row style="text-align: left;" | Structure Type  
 +
| Standard SDMX Structural Metadata Artefact
 
|-
 
|-
| Maintainable|| Yes
+
! scope=row style="text-align: left;"  | Maintainable
 +
| [[Maintainable|Yes]]
 
|-
 
|-
| Identifiable|| Yes
+
! scope=row style="text-align: left;"  | Identifiable
 +
| [[Identifiable|Yes]]
 
|-
 
|-
| Item Scheme|| Yes
+
! scope=row style="text-align: left;"  | Item Scheme
 +
| Yes
 
|-
 
|-
| SDMX Information Model Versions || 1.0, 2.0, 2.1
+
! scope=row style="text-align: left;"  | SDMX Information Model Versions  
 +
| 1.0, 2.0, 2.1
 
|-
 
|-
| Concept ID || CODELIST
+
! scope=row style="text-align: left;"  | Concept ID
 +
| CODELIST
 
|}
 
|}
==Codelists Context within the SDMX 2.1 Information Model==
+
 
 +
==Context within the SDMX 2.1 Information Model==
 
::[[File:SDMX Information Model - Core Artefacts - Codelist.png|600px|frameless]]
 
::[[File:SDMX Information Model - Core Artefacts - Codelist.png|600px|frameless]]
The schematic illustrates the core artefacts of the SDMX 2.1 Information Model.<br>
+
<p>The schematic illustrates the core artefacts of the SDMX 2.1 Information Model, and how Codelists and Codes fit in.</p>
Codelists can be referenced directly by Data Structure Definitions, or indirectly through Concepts to describe the list of valid values for enumerated Dimensions or Attributes.
+
 
 +
<p>Codelists can be referenced directly by Data Structure Definitions (DSD), or indirectly through Concepts to explictly define the set of legal values for enumerated Dimensions or Attributes.</p>
  
 
==Usage==
 
==Usage==
Line 26: Line 35:
  
 
Each code is a separate [[Item]] so must have an ID and a Name, but can also have an optional Description. While Code IDs must be unique within a Codelist, the same Code ID may be safely used in other Codelists. For instance: The code 'A' may be used in a <em>Frequency Codelist</em> to represent 'Annual', but also appear in an <em>Industry Codelist</em> to represent 'Agriculture'.<br>
 
Each code is a separate [[Item]] so must have an ID and a Name, but can also have an optional Description. While Code IDs must be unique within a Codelist, the same Code ID may be safely used in other Codelists. For instance: The code 'A' may be used in a <em>Frequency Codelist</em> to represent 'Annual', but also appear in an <em>Industry Codelist</em> to represent 'Agriculture'.<br>
 +
 +
=Conventions=
 +
<p><strong>'CL_' Codelist ID Prefix</strong></p>
 +
<p>
 +
Codelist IDs are given a 'CL_' prefix to distinguish them from other structures. For instance: CL_FREQ, CL_REF_AREA, CL_AGE.
 +
</p>
 +
 +
<p><strong>Code ID Conventions</strong></p>
 +
<p>
 +
Codes can take any legal SDMX [[SDMX ID|ID]]. But there are several conventions that should be taken into account when choosing Code IDs:<br>
 +
 +
{| class="wikitable"
 +
|-
 +
| <strong>Uppercase</strong> || By convention, Code ID's are in uppercase such as 'ABC'. Lower case Codes are valid (for example 'abc'), but care should be taken in their use to avoid confusion.
 +
|-
 +
| <strong>'_Z' Code</strong> || The '_Z' code is conventionally used for the Undefined and Unknown classification.
 +
|-
 +
| <strong>'TOTAL' Code</strong> || The 'TOTAL' code represents the total or sum of the dimension. For a 'country' dimension, series with TOTAL would indicate the sum of observation values for all countries.
 +
|}
 +
</p>
  
 
=Flat Codelists=
 
=Flat Codelists=
 
<p>Flat Codelists are simple lists of codes with no explicit or implied relationships or hierarchies.</p>
 
<p>Flat Codelists are simple lists of codes with no explicit or implied relationships or hierarchies.</p>
  
<p> Example: FREQUENCY Codelist</p>
+
<p> Example: CL_FREQUENCY Codelist</p>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
Line 52: Line 81:
 
|}
 
|}
  
=Hierarchical Codelists=
+
<p> Example: CL_ADJUSTMENT Codelist with descriptions</p>
SDMX Codelists allow codes to be organised into hierarchies enabling software tools to display them in a logical tree structure for searching and navigation. Code hierarchies can also be used as part of data consistency check rules where data is considered to be consistent if the sum (or other aggregation) of values for child codes equals the value given for the parent code. 
+
{| class="wikitable"
==Simple Codelist Hierarchies==
+
|-
Simple hierarc
+
! Code ID !! Code Name !! Code Description
==Complex Codelist Hierarchies==
+
|-
 
+
| K || Calendar component || Synonyms: Calendar effects; calendar factors
=Conventions=
+
|-
==CL_ Codelist ID Prefix==
+
| X || Seasonal component || Synonyms: Seasonal effects; seasonal factors
Codelist IDs are given a 'CL_' prefix to distinguish them from other structures. For instance: CL_FREQ, CL_REF_AREA, CL_AGE.
+
|-
==Code ID Conventions==
+
| M || Seasonal and calendar components || Synonyms: Seasonal and calendar effects; seasonal and calendar factors
Codes can take any legal SDMX [[SDMX ID|ID]]. But there are several conventions that should be taken into account when choosing Code IDs:<br>
 
 
 
{|
 
| <strong>Uppercase Code IDs</strong> || By convention, Code ID's are in uppercase such as 'ABC'. Lower case Codes are valid (for example 'abc'), but care should be taken in their use to avoid confusion.
 
 
|-
 
|-
| <strong>'_Z' Code</strong> || The '_Z' code is conventionally used for the Undefined and Unknown classification.
+
| I || Irregular component || Synonym: Irregular effects
 
|-
 
|-
| <strong>'TOTAL' Code</strong> || The 'TOTAL' code represents the total or sum of the dimension. For a 'country' dimension, series with TOTAL would indicate the sum of observation values for all countries.
+
| N || Neither seasonally adjusted nor calendar adjusted data || Synonyms: Raw data; unadjusted data
 
|}
 
|}
  
=Examples=
+
<p> Example: CL_ADJUSTMENT Codelist with multi-lingual names and descriptions</p>
==Simple Flat Codelist==
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Code ID !! Code Name
+
! Code ID !! Locale || Code Name !! Code Description
 +
|-
 +
| K || en || Calendar component || Synonyms: Calendar effects; calendar factors
 +
|-
 +
| K || fr || Composant de calendrier || Synonymes: effets de calendrier; facteurs de calendrier
 
|-
 
|-
| A || Annual
+
| X || en || Seasonal component || Synonyms: Seasonal effects; seasonal factors
 
|-
 
|-
| S || Half-yearly, semester
+
| X || fr || Composant saisonnier || Synonymes: effets saisonniers; facteurs saisonniers
 
|-
 
|-
| Q || Quarterly
+
| M || en || Seasonal and calendar components || Synonyms: Seasonal and calendar effects; seasonal and calendar factors
 
|-
 
|-
| M || Monthly
+
| M || fr || Composants saisonniers et calendaires || Synonymes: effets saisonniers et calendaires; facteurs saisonniers et calendaires
 
|-
 
|-
| W || Weekly
+
| I || en || Irregular component || Synonym: Irregular effects
 
|-
 
|-
| D || Daily
+
| I || fr || Composant irrégulier || Synonyme: effets irréguliers
 
|-
 
|-
| B || Daily - Business Week
+
| N || en || Neither seasonally adjusted nor calendar adjusted data || Synonyms: Raw data; unadjusted data
 
|-
 
|-
| N || Minutely
+
| N || fr || Ni données désaisonnalisées ni données désaisonnalisées || Synonymes: données brutes; données non ajustées
 
|}
 
|}
==Flat Codelist with Optional Code Descriptions==
+
 
 +
=Codelists with Simple Hierarchies=
 +
<p>SDMX allows simple hierarchies to be defined within flat Codelists by making a code the parent of codes that logically sit under it in the hierarchy.</p>
 +
<p>Imagine a CL_REF_AREA Codelist containing individual codes for each European country, and a code (EUR) for Europe as a whole. A simple hierarchy for Europe can be created by setting EUR as the parent for each of the countries:</p>
 +
 
 +
<p>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Code ID !! Code Name !! Code Description
+
! Code ID !! Code Name !! Parent
 +
|-
 +
| EUR || Europe || (none)
 +
|-
 +
| DE || Germany || EUR
 +
|-
 +
| FR || France || EUR
 +
|-
 +
| IT || Italy || EUR
 +
|-
 +
| GR || Greece || EUR
 +
|-
 +
| SE || Sweden || EUR
 
|-
 
|-
| K || Calendar component || Synonyms: Calendar effects; calendar factors
+
| AU || Austria || EUR
 
|-
 
|-
| X || Seasonal component || Synonyms: Seasonal effects; seasonal factors
+
| PL || Poland || EUR
 +
|}
 +
</p>
 +
 
 +
<p>Other than acting as a parent, EUR behaves as a normal code allowing series for EUR or any of the individual contries.</p>
 +
<p>A Code can only have one parent meaning that it can only appear once in a simple Codelist hierarchy. This may be restrictive where Codes can be logically organised into multiple different groups. In the CL_REF_AREA example, countries could also be grouped into trading blocs in addition to geographical regions.</p>
 +
 
 +
<p>SDMX [[Hierarchical Codelist|Hierarchical Codelists]] should be used where Codelists have more complex hierarchical structures.</p>
 +
 
 +
=Hierarchical Codelists=
 +
<p>SDMX [[Hierarchical Codelist|Hierarchical Codelists]] are separate [[Maintainable artefact|maintainable artefacts]] used to describe complex Code hierarchies.</p>
 +
 
 +
=Validity=
 +
==Structure-Level Validity==
 +
<p>As a [[Maintainable]] structure, the period in time during which a complete Codelist is considered to be valid can be defined.</p>
 +
 
 +
==Item-Level Validity==
 +
<p>Validity periods can also be defined individually for each Code item in a Codelist. A common use case involves country codes that are only valid between particular dates when the country was in ‘existence’. It also supports the case where a code changes meaning over time.</p>
 +
 
 +
<p>Example:<br>
 +
The Gilbert and Ellice Islands had the ISO country code ‘GE’ until 1977 when split to become the independent countries of Kiribati and Tuvalu. When Georgia gained independence from the former Soviet Union in 1991, the ‘GE’ code was reused. Setting the Item Validity on the GE Code allows this change in meaning to be described and stored with the Codelist.</p>
 +
 
 +
<p>Period up to 1977: GE = Gilbert and Ellice Islands<br>
 +
Period between 1977 and 1991: GE is invalid<br>
 +
Period from 1991 to present: GE = Georgia
 +
</p>
 +
 
 +
<p>In practical terms, Item Validity allows data with dimension codes that vary in meaning over time to be robustly validated and correctly disseminated.</p>
 +
 
 +
{| class="wikitable"
 
|-
 
|-
| M || Seasonal and calendar components || Synonyms: Seasonal and calendar effects; seasonal and calendar factors
+
! Use Case !! Example
 
|-
 
|-
| I || Irregular component || Synonym: Irregular effects
+
| Validation || Observation data for country ‘GE’ and the time period ‘1970’ must relate to the Gilbert and Ellice islands. Whereas observations for the same ‘GE’ code but for the period 2005 must relate to Georgia. Observations for country ‘GE’ during the period 1978 and 1990 will be rejected as invalid because the validity rule states that no country with that code existed between those dates.
 
|-
 
|-
| N || Neither seasonally adjusted nor calendar adjusted data || Synonyms: Raw data; unadjusted data
+
| Dissemination|| Visualisations showing observation data for country ‘GE’ and time period ‘1970’ will use the code named ‘Gilbert and Ellice’. When showing similar data for ‘2005’, the Item Validity rules will mean that the code named ‘Georgia’ will be displayed.
 
|}
 
|}

Latest revision as of 04:21, 18 March 2020

Overview

An SDMX Codelist is a managed list of classification codes.

Structure Properties

Structure Type Standard SDMX Structural Metadata Artefact
Maintainable Yes
Identifiable Yes
Item Scheme Yes
SDMX Information Model Versions 1.0, 2.0, 2.1
Concept ID CODELIST

Context within the SDMX 2.1 Information Model

SDMX Information Model - Core Artefacts - Codelist.png

The schematic illustrates the core artefacts of the SDMX 2.1 Information Model, and how Codelists and Codes fit in.

Codelists can be referenced directly by Data Structure Definitions (DSD), or indirectly through Concepts to explictly define the set of legal values for enumerated Dimensions or Attributes.

Usage

SDMX Codelists are lists of classification codes used principally for defining the set of allowed values for enumerated Components in Data Structure Definitions (DSDs) or Metadata Structure Definitions which describe Reference Metadata.

Each code is a separate Item so must have an ID and a Name, but can also have an optional Description. While Code IDs must be unique within a Codelist, the same Code ID may be safely used in other Codelists. For instance: The code 'A' may be used in a Frequency Codelist to represent 'Annual', but also appear in an Industry Codelist to represent 'Agriculture'.

Conventions

'CL_' Codelist ID Prefix

Codelist IDs are given a 'CL_' prefix to distinguish them from other structures. For instance: CL_FREQ, CL_REF_AREA, CL_AGE.

Code ID Conventions

Codes can take any legal SDMX ID. But there are several conventions that should be taken into account when choosing Code IDs:

Uppercase By convention, Code ID's are in uppercase such as 'ABC'. Lower case Codes are valid (for example 'abc'), but care should be taken in their use to avoid confusion.
'_Z' Code The '_Z' code is conventionally used for the Undefined and Unknown classification.
'TOTAL' Code The 'TOTAL' code represents the total or sum of the dimension. For a 'country' dimension, series with TOTAL would indicate the sum of observation values for all countries.

Flat Codelists

Flat Codelists are simple lists of codes with no explicit or implied relationships or hierarchies.

Example: CL_FREQUENCY Codelist

Code ID Code Name
A Annual
S Half-yearly, semester
Q Quarterly
M Monthly
W Weekly
D Daily
B Daily - Business Week
N Minutely

Example: CL_ADJUSTMENT Codelist with descriptions

Code ID Code Name Code Description
K Calendar component Synonyms: Calendar effects; calendar factors
X Seasonal component Synonyms: Seasonal effects; seasonal factors
M Seasonal and calendar components Synonyms: Seasonal and calendar effects; seasonal and calendar factors
I Irregular component Synonym: Irregular effects
N Neither seasonally adjusted nor calendar adjusted data Synonyms: Raw data; unadjusted data

Example: CL_ADJUSTMENT Codelist with multi-lingual names and descriptions

Code ID Locale Code Name Code Description
K en Calendar component Synonyms: Calendar effects; calendar factors
K fr Composant de calendrier Synonymes: effets de calendrier; facteurs de calendrier
X en Seasonal component Synonyms: Seasonal effects; seasonal factors
X fr Composant saisonnier Synonymes: effets saisonniers; facteurs saisonniers
M en Seasonal and calendar components Synonyms: Seasonal and calendar effects; seasonal and calendar factors
M fr Composants saisonniers et calendaires Synonymes: effets saisonniers et calendaires; facteurs saisonniers et calendaires
I en Irregular component Synonym: Irregular effects
I fr Composant irrégulier Synonyme: effets irréguliers
N en Neither seasonally adjusted nor calendar adjusted data Synonyms: Raw data; unadjusted data
N fr Ni données désaisonnalisées ni données désaisonnalisées Synonymes: données brutes; données non ajustées

Codelists with Simple Hierarchies

SDMX allows simple hierarchies to be defined within flat Codelists by making a code the parent of codes that logically sit under it in the hierarchy.

Imagine a CL_REF_AREA Codelist containing individual codes for each European country, and a code (EUR) for Europe as a whole. A simple hierarchy for Europe can be created by setting EUR as the parent for each of the countries:

Code ID Code Name Parent
EUR Europe (none)
DE Germany EUR
FR France EUR
IT Italy EUR
GR Greece EUR
SE Sweden EUR
AU Austria EUR
PL Poland EUR

Other than acting as a parent, EUR behaves as a normal code allowing series for EUR or any of the individual contries.

A Code can only have one parent meaning that it can only appear once in a simple Codelist hierarchy. This may be restrictive where Codes can be logically organised into multiple different groups. In the CL_REF_AREA example, countries could also be grouped into trading blocs in addition to geographical regions.

SDMX Hierarchical Codelists should be used where Codelists have more complex hierarchical structures.

Hierarchical Codelists

SDMX Hierarchical Codelists are separate maintainable artefacts used to describe complex Code hierarchies.

Validity

Structure-Level Validity

As a Maintainable structure, the period in time during which a complete Codelist is considered to be valid can be defined.

Item-Level Validity

Validity periods can also be defined individually for each Code item in a Codelist. A common use case involves country codes that are only valid between particular dates when the country was in ‘existence’. It also supports the case where a code changes meaning over time.

Example:
The Gilbert and Ellice Islands had the ISO country code ‘GE’ until 1977 when split to become the independent countries of Kiribati and Tuvalu. When Georgia gained independence from the former Soviet Union in 1991, the ‘GE’ code was reused. Setting the Item Validity on the GE Code allows this change in meaning to be described and stored with the Codelist.

Period up to 1977: GE = Gilbert and Ellice Islands
Period between 1977 and 1991: GE is invalid
Period from 1991 to present: GE = Georgia

In practical terms, Item Validity allows data with dimension codes that vary in meaning over time to be robustly validated and correctly disseminated.

Use Case Example
Validation Observation data for country ‘GE’ and the time period ‘1970’ must relate to the Gilbert and Ellice islands. Whereas observations for the same ‘GE’ code but for the period 2005 must relate to Georgia. Observations for country ‘GE’ during the period 1978 and 1990 will be rejected as invalid because the validity rule states that no country with that code existed between those dates.
Dissemination Visualisations showing observation data for country ‘GE’ and time period ‘1970’ will use the code named ‘Gilbert and Ellice’. When showing similar data for ‘2005’, the Item Validity rules will mean that the code named ‘Georgia’ will be displayed.