Reference Metadata

From Fusion Registry Wiki
Revision as of 00:20, 21 July 2022 by Mnelson (talk | contribs) (Metadata Provision Agreement)
Jump to navigation Jump to search

Overview

Reference Metadata is a structured document of information which can be attached to any SDMX structure. Reference Metadata should be thought of as metadata to help the user understand more about the data. The information in a Reference Metadata report is textual, it is structured under headings/sub headings, and can be a mixture of data types including html, numbers, coded values, boolean, urls, and more.

Reference Metadata can be used for any purpose, it can be used to document more information about a Concept or a Code, or a mapping. Common use cases for Reference Metadata are to capture data quality metadata including aspects such as how the data was collected, legal information, contract information, and so on. The IMF Data Quality Framework provides a good example of this type of metadata, with their Dissemination Standards Bulletin Board (DSBB) demonstrating the use of Data Quality Metadata in a collection and dissemination space.

Designing a Metadata Collection

Metadata Structure Definition

In the same way that SDMX defines the shape and content of your Datasets, in terms of the Dimensions and Attributes, there is a same definition for Reference Metadata. The Metadata Structure Definition is the structure used to define the allowable content of a Metadata Report, a good analogy is a user creating a Word document and filling in all the Headings and Sub Headings in the document, but not completing the body of the document. What they have created is a template which they can distribute to their users to fill in the details.

The Metadata Structure Definition defines a list of Concepts (the analogy is a document heading) the Concepts can be made into a hierarchy (headings and sub headings), Concepts are given allowable representation (is the authored content under the heading HTML, Integer, Coded, does it have a maximum length, etc.) The allowable representation can even be set to ‘none’ to indicate the heading is just a presentational heading to group a set of sub headings. For example the Concept of Contact may be set to be presentational, with the sub Concepts of Name, Phone Number, and Department as sub headings which do take content. The corresponding Metadata Report would look like this:

 Contact
 Name : Matthew Nelson
 Phone Number : 01234 444555
 Department : IT

Metadata Flow

The Metadata Structure Definition defines the template of the Metadata Report, it is the Metadataflow which the Metadata Report is reported against.

The Metadataflow is a SDMX structure which does two things:

  1. It references a Metadata Structure Definition, this defines the structure Metadata Reports which are reported against it.
  2. It defines one or more allowable targets, explained below

When a Metadata Report is authored, it must be attached to one or more SDMX structures. For example a report about data collection methodology might be attached to a specific Dataflow, for example National Accounts. The author of the report chooses what structure(s) they are attaching their report to. However, the collecting Agency, the owner of the Metadataflow may want to restrict what options are given to the report owner. It is the Metadataflow which enables the collecting Agency to provide one or more restrictions.

The restriction (an allowable target) may define a single allowable structure, such as

Dataflow=IMF:BOP(1.0)

The target must be a Dataflow owned by the IMF with the ID of BOP and the version of 1.0. This target gives the report author no real choice in what they attached their report to, it must be 1 and only 1 structure which has been predefined.

The target could be less restrictive by allowing wildcard values

Dataflow=IMF:*(*)

The target must be a Dataflow owned by the IMF. The report author may now attach their report to any IMF Dataflow at any version.

Any part of the target (including the structure type) can be left open. Therefore it is possible to define a target for any structure, any agency, any id, at any version.

It is also possible to define multiple targets

Code=*:CL_FREQ(*).*
Concept=*:*(*).FREQ

The above allowing attachment to any Code in a Codelist with ID CL_FREQ or any Concept with the ID of FREQ which is defined in any Concept Scheme.

If the collection of Reference Metadata has multiple providers, then the Metadata Provision Agreement can be used to provide an extra control over the collection process.

Metadata Provision Agreement

The Metadata Provision Agreement is an SDMX structure that has 3 pieces of information:

  1. It defines what Metadataflow it is for, which by extension defines the Metadata Structure Definition
  2. It defines who the Metadata Provider is (who owns the report)
  3. It can define additional targets which further restrict those on the Metadataflow

Whilst a Metadata Report can be authored directly against a Metadataflow, in a collection environment where there may be multiple reports from multiple providers, the Metadata Provision Agreement is needed to define the ownership of each Metadata Report, it does this by linking a Metadata Provider to a Metadataflow. The ability to further restrict the target enables a Metadataflow to be defined with an open target such as ‘any IMF Dataflow’:

Dataflow=IMF:*(*) 

And for each Metadata Provider, further level of restrictions can be defined on the Metadata Provision Agreement.

Dataflow=IMF:BOP(*)
Dataflow=IMF:NAC(*)