Reference Metadata

From Fusion Registry Wiki
Revision as of 03:09, 21 July 2022 by Mnelson (talk | contribs) (Metadata Flow)
Jump to navigation Jump to search

Overview

Reference Metadata is a structured document of information which can be attached to any SDMX structure. Whist Reference Metadata can be used to capture information to run business processes, the typical use case is to capture textual information, intended for human users to get a better understanding about the information they are looking at. This could be more detailed information about a dataset, contact details, legal text, or any other type of information.

Reference Metadata report are structured under headings and sub headings. The reported content can be a mixture of data types including html, numbers, coded values, boolean, urls, and more. One way to think about a Reference Metadata Report is as a Microsoft Word document with headings and content. Where the Word document is linked to some other part of the SDMX information model (a specific Concept or Code for example, or a collection of Dataflows).

Common use cases for Reference Metadata are to capture data quality metadata including aspects such as how the data was collected, legal information, contract information, and so on. The IMF Data Quality Framework provides a good example of this type of metadata, with their Dissemination Standards Bulletin Board (DSBB) demonstrating the use of Data Quality Metadata in a collection and dissemination space.

Designing a Metadata Collection

Metadata Structure Definition

In the same way that SDMX defines the shape and content of Datasets in terms of the Dimensions and Attributes, there is the same paradigm definition for Reference Metadata.

The Metadata Structure Definition is the structure used to define the allowable content of a Metadata Report. A good analogy for a Metadata Structure Definition is a Word document with all the Headings and Sub Headings completed, but no content under any of the headings. This word document can be thought of as a template which could be distributed to users in order to complete. In the same way, the Metadata Structure Definition (MSD) is a template, which lays out the structure of the document Headings/Sub Headings, the analogy is not perfect because the MSD also describes cardinality (how many times each heading can repeat, if the heading is mandatory or optional), the MSD also defines content. It is not possible in a Word document to create a rule to say the content under a particular heading is restricted to 400 characters, or it can be in multiple languages, or it must be a value in a corresponding Codelist, with an MSD it is possible. So whilst the Microsoft Word analogy helps visualise the role of the MSD, the MSD provides much more control over what can be reported.

In terms of the SDMX Information Mode, the Metadata Structure Definition defines a list of Concepts (the analogy is a document heading). Each Concept can be put into a hierarchy (headings and sub headings). Concepts are given allowable representation (is the authored content under the heading HTML, Integer, Coded, does it have a maximum length, etc.) The allowable representation can even be set to ‘none’ to indicate the heading is just a presentational heading to group a set of sub headings.

An example below shows the Concept of Contact set to be presentational which allows more then one occurrence. The sub Concepts of Name, Phone Number, and Department are sub headings. Name would have a max limit, Phone Number could be set to numerical or given a pattern which incudes spaces, Department could come from a Codelist or be simple text, Website would take valid URLs.

 Contact
 Name : Matthew Nelson
 Phone Number : 01234 444555
 Department : IT
 Website: https://metadatatechnology.com/
 Contact
 Name : Glenn Tice
 Phone Number : 01234 444666
 Department : Management
 Website: https://metadatatechnology.com/

Metadata Flow

The Metadata Structure Definition defines the template of the Metadata Report, it is the Metadataflow which the Metadata Report is reported against.

The Metadataflow is a SDMX structure which does two things:

  1. It references a Metadata Structure Definition, this provides the rules to which the Metadata Report must conform
  2. It defines one or more allowable targets, explained below

When a Metadata Report is authored, it must be attached to one or more SDMX structures. For example a report about data collection methodology might be attached to a specific Dataflow, for example National Accounts. The author of the report chooses what structure(s) they are attaching their report to. However, the collecting Agency, the owner of the Metadataflow may want to restrict what options are given to the report owner. It is the Metadataflow which enables the collecting Agency to provide one or more restrictions.

The restriction (an allowable target) may define a single allowable structure, such as

Dataflow=IMF:BOP(1.0)

The target must be a Dataflow owned by the IMF with the ID of BOP and the version of 1.0. This target gives the report author no real choice in what they attached their report to, it must be 1 and only 1 structure which has been predefined.

The target could be less restrictive by allowing wildcard values

Dataflow=IMF:*(*)

The target must be a Dataflow owned by the IMF. The report author may now attach their report to any IMF Dataflow at any version.

Any part of the target (including the structure type) can be left open. Therefore it is possible to define a target for any structure, any agency, any id, at any version.

It is also possible to define multiple targets

Code=*:CL_FREQ(*).*
Concept=*:*(*).FREQ

The above allowing attachment to any Code in a Codelist with ID CL_FREQ or any Concept with the ID of FREQ which is defined in any Concept Scheme.

If the collection of Reference Metadata has multiple providers, then the Metadata Provision Agreement can be used to provide an extra control over the collection process.

Metadata Provision Agreement

Whilst a Metadata Report can be authored directly against a Metadataflow, in a collection environment where there may be multiple reports from multiple providers, the Metadata Provision Agreement is needed to define the ownership of each Metadata Report

The Metadata Provision Agreement is an SDMX structure that has 3 pieces of information:

  1. It defines what Metadataflow it is for, which by extension defines the Metadata Structure Definition
  2. It defines who the Metadata Provider is (who owns the report)
  3. It can define additional targets which further restrict those on the Metadataflow

Linking a Metadata Provider to a Metadataflow enables further restriction to be applied to the the allowable targets, giving finer control over the Reference Metadata collection.

For example an Metadataflow can be defined with an generic target such as ‘any IMF Dataflow’:

Dataflow=IMF:*(*) 

If Metadata Reports are collected directly against this Metadataflow the Report author could attach the report to any or all Dataflows. In reality, the provider may only report data against a subset of these Dataflows and therefore it makes no sense for them to report data collection methodologies against Dataflows that they do not report. The Metadata Provision Agreement set up a the Metadata Provider would further restrict the targets, to the Dataflows they do report data against, for example:

Dataflow=IMF:BOP(*)
Dataflow=IMF:NAC(*)

Report Ownership

Reference Metadata Reports are maintainable artefacts, and as such are uniquely identified with 3 properties

  1. The owner (agency)
  2. The identity (ID)
  3. The version

These 3 properties form a composite key, and therefore it is possible to have multiple reports with the same ID as long as either the owner and/or version are different.

Unlike any other SDMX structure the ownership of a Metadata Report does not have to be an SDMX Agency. When a Metadata Report is authored against a Metadata Provision Agreement, it is the Metadata Provider (defined by the Metadata Provision Agreement) who owns the report.

It is important to remember that the Metadata Provider itself is defined and therefore owned by an Agency. For example, if the Agency IMF wants to collect Metadata Reports from multiple Metadata Providers, they will first define who the Metadata Providers are in a Metadata Provider Scheme which is owned by the IMF. For example:

IMF Metadata Providers
UK1 - Bank of England
FR1 - Banque de France
ES1 - Instituto Nacional de Estadistica

When a Metadata Provider authors a report, the ownership is the ID of the Metadata Provider (UK1 for example) concatenated with the ID of the Agency who owns the Metadata Provider Scheme (IMF for example). The order is [agency id].[provider id] for example:

IMF.UK1

This structure enables the Metatadata Provider to take ownership of their reports, and as such given them permission to add/edit/delete their reports.

When a Metadata Report is authored directly against a Metadataflow, the Agency who owns the Report is the Agency that authored the Report. It is therefore possible for the BIS to author a Metadata Report against a Metadataflow owned by the IMF.

Security rules can be defined in Fusion Registry to restrict Metadata Report read/write rules.

Reporting Reference Metadata

Reference Metadata must conform to the Metadata Structure Definition, it is authored in SDMX format, currently only SDMX-JSON is supported.

The Report has the following pieces of information:

  1. The identity. The owner, id, and version of the report, used for unique identification.
  2. The name/description.
  3. The target(s). This defines what structure or structures the report is for.
  4. The content. The content of the report is defined in Metadata Attributes. Each Metadata Attribute corresponds to a Concept in the Metadata Structure Definition.

A SDMX-JSON example follows:

{
 "meta": {
   "id": "IREF920760",
   "test": false,
   "schema": "https://raw.githubusercontent.com/sdmx-twg/sdmx-json/master/metadata-message/tools/schemas/2.0.0/sdmx-json-metadata-schema.json",
   "prepared": "2022-07-21T08:27:55Z",
   "contentLanguages": ["en"],
   "sender": {"id": "FusionRegistry"}
 },
 "data": {
   "metadataSets": [
     {
       "id": "EXAMPLE",
       "names": {
         "en": "Example Report"
       },
       "version": "1.0.0",
       "agencyID": "IMF.UK1"
       "metadataflow": "urn:sdmx:org.sdmx.infomodel.registry.MetadataProvisionAgreement=IMF:MDF_UK1_DQAF(1.0.0)",
       "targets": [
         "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=IMF:BOP(*)"
       ],
       "attributes": [
         {
           "id": "QUALITY",
           "attributes": [
             {
               "id": "LEGAL",
               "value": {
                 "en": "The responsibility for collecting, processing, and...."
               }
             },
             {
               "id": "RESOURCE",
               "value": {
                 "en": "Staff, facilities, computing resources, and financing"
               }
             },
             {
               "id": "RELEVANCE",
               "value": {
                 "en": "The relevance and practical utility of existing statistics"
               }
             }
           ]
         },
         {
           "id": "INTEGRITY",
           "attributes": [
             {
               "id": "INST",
               "value": {
                 "en": "Statistics are produced on an impartial basis"
               }
             },
             {
               "id": "TRANSPARENCY",
               "value": {
                 "en": "The terms and conditions under which statistics are collected, processed, and disseminated are available to the public."
               }
             },
             {
               "id": "ETHICAL",
               "value": {
                 "en": "Guidelines for staff behavior are in place and are well known to the staff"
               }
             }
           ]
         }]
       },
       {
           "id": "METHODOLOGY",
           "attributes": [
             {
               "id": "SCOPE",
               "value": {
                 "en": "The scope is broadly consistent with internationally accepted standards, guidelines or good practices"
               }
             },
             {
               "id": "CLASSIFICATION",
               "value": {
                 "en": "systems used are broadly consistent with internationally accepted standard"
               }
             },
             {
               "id": "BASIS",
               "value": {
                 "en": "Market prices are used to value flows and stocks."
               }
             },
             {
               "id": "SOURCE",
               "value": "IMF"
             }
           ]
         }
       ]
     }
   ]
 }
}

This example shows a Report owned by the IMF.UK1 Metadata Provider. It conforms to the rules of the Metadata Provision Agreement owned by the IMF with Id MDF_UK1_DQAF at version 1.0.0. The report is against the IMF Dataflow BOP at any version (it attaches to all Dataflows which match).

The reported content is given under the Metadata Attributes, where each attribute has an ID (which relates to the Metadata Attribute with the same ID defined in the Metadata Structure Definition). Some attributes are presentational only, such as Quality, Collection Methodology, and Integrity. Other attributes have content. All attributes support multilingual text with the exception of SOURCE (the last attribute) whose content is simple text (not partitioned by locale).

The report would can be presented in a dissemination environment as a readable document, like metadata is presented in the IMF DSBB.


Finding Reference Metadata

When a Metadata Report is saved to the Fusion Registry, it's presence is immediately reflected in the structure or structures to which it relates. Any structure which has associated reference metadata will provide links back to each Metadata Report (SDMX 3.0 formats only). Therefore a report against a specific Dataflow will result in that Dataflow linking back to the Metadata Report. The link is generated dynamically by the Fusion Registry, and as such the owner of the Dataflow does not need to take any action to maintain this link. If the Metadata Report is deleted, the link is removed. If the Metadata Report is against multiple structures, then multiple links are created, one for each structure.

In addition to the linking mechanism, it is possible to find reference metadata by:

  1. Metadataflow (and optionally Metadata Provider),
  2. Unique Identifiers of the Report
  3. Structure

The SDMX REST Specification for metadata provides full details on the web services available for discovering reference metadata.

Maintaining Reference Metadata

The Fusion Registry provides web services to manage, find, and remove reference metadata.