Representation Map

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

Structure Properties

Structure Type Standard SDMX Structural Metadata Artefact
Maintainable Yes
Identifiable Yes
Item Scheme No
SDMX Information Model Versions 3.0
Concept ID RepresentationMap

Context within the SDMX 3.0 Information Model

Representation Map

Usage

SDMX Version 3.0 Representation Maps are used to define a mapping rules between a source value (or combination of values) and a target value (or combination of targets). When a souce value is matched, the target value is output. Representation Maps can be used to map from one classification to another (for example from ISO2 to ISO3 character country codes), or can be used to map from non coded to coded values ($ maps to USD), or more complex mapping which require a combination of source values to generate target values. A Representation Map is more than a simple lookup table, as more complex rules can be introduced such as regular expression matches which can include capture groups to transfer patterns from source to target, substring matches, as well as rules which are only applicable to certain periods of time.

Representation Maps can be used by Structure Maps when defining relationships between two Data Structure Definitions (DSDs). The Structure Maps describe the relationships between the Components of the two DSDs, whilst Relationship Maps are used to describe how the values reported for each component map. For example, a Structure Map may say the COUNTRY Dimension in the source DSD maps to the REF_AREA Dimension in the target DSD - it is then the Relationship Map which describes that the value GB maps to GBR.

Representation Map - Model

Source and Target

A Representation Map can describe the allowable inputs by linking to source Codelists, or Valuelists, if the input is free text then the source is simply defined as Free Text. a Representation Map is not restricted to one source, it can define multiple, of mixed types. A combination of source values is a typical use case when a Representation Map is used by a Structure Map which is mapping more than one source component. For example, a Structure Map may state that the combination of REF_AREA and CURRENCY are used to derive an output value. The corresponding Representation Map would then have to include 2 sources, one for REF_AREA and the other for CURRENCY. A rule may state that the UK in combination with USD maps to a certain output value, whilst the UK in combination with GBP maps to another value.

A Representation Map has the same options for the targets, in that a target can link to a Codelist, Valuelist, or simply describe itself as a free text target (any textual content is valid).

An example Representation Map which maps from ISO2 to ISO3 Country Codes may define a single source Codelist of CL_ISO2_COUNTRIES and a target of CL_ISO3_COUNTRIES. Mapped source values will be code Ids in the ISO2 Codelist and the corresponding target will be codes in the ISO3 Codelist. It is perfectly valid to create the same mapping rules between ISO2 codes and ISO3 codes without linking the Representation Map to any source codelist or target codelist (just define both source and target as free text). The mapped values are the same, GB maps to GBR for example. The reason to link to the Country Codelist tells the user of the map, the intent of the mapping (to map between country codelists). It is also helpful to discover mappings for codelists (to ask the question which mappings exist for ISO2 Countries), and to ensure referential integrity (a Code can not be deleted from the Codelist if it is used in a Mapping Rule).

Mapped Values

A Representation Map essentially defines a list of mappings, source to target. In its simplest form mapping rules can be thought of as a lookup table

Simple mapping rules as a lookup table
Source Value Target Value
GB GBR
US USA

However, each rule may have multiple source inputs (and could have multiple outputs). Each source input rule may not be as simple as a string compare, it could define a rule to take a substring before applying the compare, or even apply a regular expression to match any input that follows a particular pattern. In this way, a single rule could be thought of as a table of matching rules, each input must match before the rule is deemed to have passed, and the corresponding output generated.

Defining a single mapping rule, consisting of 3 inputs and one output
Source Value Sub-String Start Sub-String End Is RegEx Target Value
GB - - No MY_NEW_CODE_ID
A - 1 No
[xyz] 2 4 Yes

Substring Rules

An input rule can define the start and end index of a string to match on, the index is 0 indexed (the 0 position is the start of the string).

Example
Rule: Match Value = UK, Substring Start = 2, Substring End = 4
Matched Input: __UK__
Matched Input: AAUKBB
No Match Input: AUKBB

Regular Expressions

An input rule can define a pattern match using regular expressions. Regular expressions have been around since the 1950s and as such are in common use, with many online builders available to assist in generating a valid RegEx.

Example - string contains "W2" anywhere in the input
Rule: Match Value = ^.*W2.*$
Matched Input: W2
Matched Input: ABC-W2
Matched Input: abcdefW2ghijkl


Example (zip code with or without dashes)
Rule: Match Value = (^\d{5}$)|(^\d{9}$)|(^\d{5}-\d{4}$)
Matched Input: 32225-1234
Matched Input: 322251234
Matched Input: 32225
Matched Input: 3222

Note: A RexEx rule can be used in combination with substring. The substring will first be taken on the input value, before applying the RegEx

Capture Groups

An extension to the Regular Expression rules, the captured information from a RegEx match can be used to further match an input, or an output. This is achieved using a Regular Expression capture group. When part of the RegEx is in parenthesis it forms a capture group, and expression may contain 0 to many capture groups, each group can be referred to by its index.


Example RegEx with 2 Capture Groups
RegEx: ([ABC])([DEF]) - match the character A,B, or C followed by D, E, or F
Match: AD
Capture Group 1: 1
Capture Group 2: 2

The above rule defines 2 capture groups, when an input value is matched, parts of the value which were matched by a capture group can be referred to by index (starting at 1), by using the \ prefix. For example, \1 would output A and \2 would output D from the above example.

A mapping rule can use capture groups by copying matched information from one source value, into either another source value, or a target value (of both!).


Example using Capture groups in target
Input Rule: ([ABC])([DEF])
Output Rule: \2&\1
Matched Input: AD
Output: D&A


Example using Capture groups with multiple source inputs

Source Rule 1 Source Rule 2 Input Value 1 Input Value 2 Is Match
([ABC])([DEF]) \2\1 AD DA Yes
([ABC])([DEF]) \2\1 AD AD No

Example Use Cases

Simple Lookup Table

To create a simple lookup table, for example ISO2 to ISO3 country codes, a simple Representation Map is required with a single source Codelist (ISO2 Countries) and a single target Codelist (ISO3 Countries)

ISO2 Countries ISO3 Countries
AF AFG
AL ALB
DZ DZA


Default if not specified

This use case is supported by creating a final mapping which uses the Regular Expression for match all. This mapping will only be used if no other mappings match

ISO2 Countries Is Reg Ex ISO3 Countries
AF false AFG
AL false ALB
DZ false DZA
.* true _Z

In the above example if no country codes match, then _Z will be output.

Copy Value Unless

This use case is when the user wants to copy the input to the output (verbatim), with the exception of a few conditions. This could be used when the source and target Codelists are largely the same, with a few exceptions. This use case is satisfied by using explicit mapping, and a catch all regular expression which is used to perform a copy from source to target by using the regex capture group.

Country Is Reg Ex REF_AREA
RW false ROW
IO false INO
(.*) true \1

In the above example if the source country is RW it is mapped to ROW, IO is mapped to INO, for anything else, the value is simply copied as is.

If-Then-Else

This use case is supported by creating a number of RegEx expressions, which are executed in order. The first one to match wins.

Source Is Reg Ex Target
S1 false S1A Exact Match, S1 maps to S1A
S2 false S2A Exact Match, S1 maps to S2A
{2}[A-Z]1 true _X Map to _X if the source starts with two upper case A to Z characters followed by the number 1
.* true _Z anything maps to _Z

In this example, the mapping process first checks the exact matches, if none match, then each reg-ex pattern is tested in turn, in the above example the final reg ex just matches anything to output _Z.

Collapsing Dimensions

This use case is to facilitate the collection of data conforming to one DSD, whist the internal usage (or future dissemination) of the data may conform to another DSD. Consider the following DSDs which are made available by the IMF, both are Balance of Payments, however one is named as Global Use, and the other is For Collection. It is possible that the data collected against one DSD can be translated to the other DSD, however the Global Use DSD has many more Dimensions, these appear to have been replaced an INDICATOR Dimension in the For Collection DSD:

BOP_GBPM6 Balance of Payments _Global use_BOP_BPM6 BOP6 Balance of Payments (BPM6) for Collection
  • [FREQ] Frequency
  • [ADJUSTMENT] Adjustment indicator
  • [REF_AREA] Reference country or area
  • [COUNTERPART_AREA] Counterpart Area
  • [REF_SECTOR] Reference sector
  • [COUNTERPART_SECTOR] Counterpart sector
  • [FLOW_STOCK_ENTRY] Flows and stocks indicator
  • [ACCOUNTING_ENTRY] Accounting entries
  • [INT_ACC_ITEM] International accounts item
  • [FUNCTIONAL_CAT] Functional category
  • [INSTR_ASSET] Instrument and assets classification
  • [MATURITY] Maturity
  • [UNIT_MEASURE] Unit of measure
  • [CURRENCY_DENOM] Currency of denomination
  • [VALUATION] Valuation
  • [COMP_METHOD] Compilation methodology
  • [TIME_PERIOD] Time Period
   [DATA_DOMAIN] Data Domain
   [REF_AREA] Reference Country or area
   [INDICATOR]  Economic Indicator
   [COUNTERPART_AREA] Counterpart Area
   [FREQ] Frequency
   [TIME_PERIOD] Time period 

In this use case, we will need to create a Representation Map to describe how the combination of Source values are translated into a corresponding value for the INDICATOR Dimension. This Representation Map can then be used by a Structure Map in order to map the Global BOP Dataflow to the Collection BOP Dataflow.
The First step is to create a Representation Map and choose multiple Source Codelists, one for each Dimension highligthed in BOLD. Each of these Dimensions describes which Codelist it uses, the ADJUSTMENT Dimension for example uses IMF:CL_ADJUSTMENT(1.4). The Representation Map will have the following Source Codelists:

  1. IMF:CL_ADJUSTMENT(1.4)
  2. ESTAT:CL_SECTOR(1.9) (REF_SECTOR)
  3. ESTAT:CL_SECTOR(1.9) (COUNTERPART_SECTOR)
  4. IMF:CL_FSENTRY(1.1)
  5. IMF:CL_ACCOUNT_ENTRY(1.2)
  6. IMF:CL_ACCOUNTS_ITEM(1.5)
  7. IMF:CL_FUNCTIONAL_CAT(1.8)
  8. ESTAT:CL_INSTR_ASSET(1.8)
  9. ESTAT:CL_MATURITY(1.6)
  10. IMF:CL_UNIT(1.12) (UNIT_MEASURE)
  11. IMF:CL_UNIT(1.12) (CURRENCY_DENOM)
  12. ESTAT:CL_VALUATION(1.6)
  13. IMF:CL_COMP_METHOD(1.2)

Only One Target Codelist is required, the CL_INDICATOR Codelist.

When defining the mapping, it is important to think about each mapping as a row in a table, where the indicator is only mapped if each of the source values match for all the other Dimensions, for example:

ADJUSTMENT REF_SECTOR COUNTERPART_SECTOR FLOW_STOCK_ENTRY ACCOUNTING_ENTRY INT_ACC_ITEM FUNCTIONAL_CAT INSTR_ASSET MATURITY UNIT_MEASURE CURRENCY_DENOM VALUATION COMP_METHOD INDICATOR
_Z S1 S1 T A CA D F L _Z _Z _Z C IND_ABCD1
_Z S1 S1 T B CA D F L _Z _Z _Z L IND_ABCD2
_Z S1 S1 T C CA D F L _Z _Z _Z C IND_ABCD3