Difference between revisions of "Reference Metadata"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Report Ownership)
(Reporting Reference Metadata)
Line 97: Line 97:
  
 
= Reporting Reference Metadata =
 
= Reporting Reference Metadata =
 +
 +
Reference Metadata must conform to the Metadata Structure Definition, it is authored in SDMX format, currently only [https://github.com/sdmx-twg/sdmx-json/tree/master/metadata-message SDMX-JSON] is supported.
 +
 +
The Report has the following pieces of information:
 +
 +
# The identity. The owner, id, and version of the report, used for unique identification.
 +
# The name/description. 
 +
# The target(s).  This defines what structure or structures the report is for.
 +
# The content.  The content of the report is defined in Metadata Attributes. Each Metadata Attribute corresponds to a Concept in the Metadata Structure Definition.
 +
 +
A SDMX-JSON example follows:
 +
 +
{
 +
  "meta": {
 +
    "id": "IREF920760",
 +
    "test": false,
 +
    "schema": "https://raw.githubusercontent.com/sdmx-twg/sdmx-json/master/metadata-message/tools/schemas/2.0.0/sdmx-json-metadata-schema.json",
 +
    "prepared": "2022-07-21T08:27:55Z",
 +
    "contentLanguages": ["en"],
 +
    "sender": {"id": "FusionRegistry"}
 +
  },
 +
  "data": {
 +
    "metadataSets": [
 +
      {
 +
        "id": "EXAMPLE",
 +
        "names": {
 +
          "en": "Example Report"
 +
        },
 +
        "version": "1.0.0",
 +
        "agencyID": "IMF.UK1"
 +
        "metadataflow": "urn:sdmx:org.sdmx.infomodel.metadatastructure.Metadataflow=IMF:MDF_DQAF(1.0.0)",
 +
        "targets": [
 +
          "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=IMF:BOP(*)"
 +
        ],
 +
        "attributes": [
 +
          {
 +
            "id": "QUALITY",
 +
            "attributes": [
 +
              {
 +
                "id": "LEGAL",
 +
                "value": {
 +
                  "en": "The responsibility for collecting, processing, and...."
 +
                }
 +
              },
 +
              {
 +
                "id": "RESOURCE",
 +
                "value": {
 +
                  "en": "Staff, facilities, computing resources, and financing"
 +
                }
 +
              },
 +
              {
 +
                "id": "RELEVANCE",
 +
                "value": {
 +
                  "en": "The relevance and practical utility of existing statistics"
 +
                }
 +
              }
 +
            ]
 +
          },
 +
          {
 +
            "id": "INTEGRITY",
 +
            "attributes": [
 +
              {
 +
                "id": "INST",
 +
                "value": {
 +
                  "en": "Statistics are produced on an impartial basis"
 +
                }
 +
              },
 +
              {
 +
                "id": "TRANSPARENCY",
 +
                "value": {
 +
                  "en": "The terms and conditions under which statistics are collected, processed, and disseminated are available to the public."
 +
                }
 +
              },
 +
              {
 +
                "id": "ETHICAL",
 +
                "value": {
 +
                  "en": "Guidelines for staff behavior are in place and are well known to the staff"
 +
                }
 +
              }
 +
            ]
 +
          },

Revision as of 01:46, 21 July 2022

Overview

Reference Metadata is a structured document of information which can be attached to any SDMX structure. Reference Metadata should be thought of as metadata to help the user understand more about the data. The information in a Reference Metadata report is textual, it is structured under headings/sub headings, and can be a mixture of data types including html, numbers, coded values, boolean, urls, and more.

Reference Metadata can be used for any purpose, it can be used to document more information about a Concept or a Code, or a mapping. Common use cases for Reference Metadata are to capture data quality metadata including aspects such as how the data was collected, legal information, contract information, and so on. The IMF Data Quality Framework provides a good example of this type of metadata, with their Dissemination Standards Bulletin Board (DSBB) demonstrating the use of Data Quality Metadata in a collection and dissemination space.

Designing a Metadata Collection

Metadata Structure Definition

In the same way that SDMX defines the shape and content of your Datasets, in terms of the Dimensions and Attributes, there is a same definition for Reference Metadata. The Metadata Structure Definition is the structure used to define the allowable content of a Metadata Report, a good analogy is a user creating a Word document and filling in all the Headings and Sub Headings in the document, but not completing the body of the document. What they have created is a template which they can distribute to their users to fill in the details.

The Metadata Structure Definition defines a list of Concepts (the analogy is a document heading) the Concepts can be made into a hierarchy (headings and sub headings), Concepts are given allowable representation (is the authored content under the heading HTML, Integer, Coded, does it have a maximum length, etc.) The allowable representation can even be set to ‘none’ to indicate the heading is just a presentational heading to group a set of sub headings. For example the Concept of Contact may be set to be presentational, with the sub Concepts of Name, Phone Number, and Department as sub headings which do take content. The corresponding Metadata Report would look like this:

 Contact
 Name : Matthew Nelson
 Phone Number : 01234 444555
 Department : IT

Metadata Flow

The Metadata Structure Definition defines the template of the Metadata Report, it is the Metadataflow which the Metadata Report is reported against.

The Metadataflow is a SDMX structure which does two things:

  1. It references a Metadata Structure Definition, this defines the structure Metadata Reports which are reported against it.
  2. It defines one or more allowable targets, explained below

When a Metadata Report is authored, it must be attached to one or more SDMX structures. For example a report about data collection methodology might be attached to a specific Dataflow, for example National Accounts. The author of the report chooses what structure(s) they are attaching their report to. However, the collecting Agency, the owner of the Metadataflow may want to restrict what options are given to the report owner. It is the Metadataflow which enables the collecting Agency to provide one or more restrictions.

The restriction (an allowable target) may define a single allowable structure, such as

Dataflow=IMF:BOP(1.0)

The target must be a Dataflow owned by the IMF with the ID of BOP and the version of 1.0. This target gives the report author no real choice in what they attached their report to, it must be 1 and only 1 structure which has been predefined.

The target could be less restrictive by allowing wildcard values

Dataflow=IMF:*(*)

The target must be a Dataflow owned by the IMF. The report author may now attach their report to any IMF Dataflow at any version.

Any part of the target (including the structure type) can be left open. Therefore it is possible to define a target for any structure, any agency, any id, at any version.

It is also possible to define multiple targets

Code=*:CL_FREQ(*).*
Concept=*:*(*).FREQ

The above allowing attachment to any Code in a Codelist with ID CL_FREQ or any Concept with the ID of FREQ which is defined in any Concept Scheme.

If the collection of Reference Metadata has multiple providers, then the Metadata Provision Agreement can be used to provide an extra control over the collection process.

Metadata Provision Agreement

Whilst a Metadata Report can be authored directly against a Metadataflow, in a collection environment where there may be multiple reports from multiple providers, the Metadata Provision Agreement is needed to define the ownership of each Metadata Report

The Metadata Provision Agreement is an SDMX structure that has 3 pieces of information:

  1. It defines what Metadataflow it is for, which by extension defines the Metadata Structure Definition
  2. It defines who the Metadata Provider is (who owns the report)
  3. It can define additional targets which further restrict those on the Metadataflow

Linking a Metadata Provider to a Metadataflow enables further restriction to be applied to the the allowable targets, giving finer control over the Reference Metadata collection.

For example an Metadataflow can be defined with an generic target such as ‘any IMF Dataflow’:

Dataflow=IMF:*(*) 

If Metadata Reports are collected directly against this Metadataflow the Report author could attach the report to any or all Dataflows. In reality, the provider may only report data against a subset of these Dataflows and therefore it makes no sense for them to report data collection methodologies against Dataflows that they do not report. The Metadata Provision Agreement set up a the Metadata Provider would further restrict the targets, to the Dataflows they do report data against, for example:

Dataflow=IMF:BOP(*)
Dataflow=IMF:NAC(*)

Report Ownership

Reference Metadata Reports are maintainable artefacts, and as such are uniquely identified with 3 properties

  1. The owner (agency)
  2. The identity (ID)
  3. The version

These 3 properties form a composite key, and therefore it is possible to have multiple reports with the same ID as long as either the owner and/or version are different.

Unlike any other SDMX structure the ownership of a Metadata Report does not have to be an SDMX Agency. When a Metadata Report is authored against a Metadata Provision Agreement, it is the Metadata Provider (defined by the Metadata Provision Agreement) who owns the report.

It is important to remember that the Metadata Provider itself is defined and therefore owned by an Agency. For example, if the Agency IMF wants to collect Metadata Reports from multiple Metadata Providers, they will first define who the Metadata Providers are in a Metadata Provider Scheme which is owned by the IMF. For example:

IMF Metadata Providers
UK1 - Bank of England
FR1 - Banque de France
ES1 - Instituto Nacional de Estadistica

When a Metadata Provider authors a report, the ownership is the ID of the Metadata Provider (UK1 for example) concatenated with the ID of the Agency who owns the Metadata Provider Scheme (IMF for example). The order is [agency id].[provider id] for example:

IMF.UK1

This structure enables the Metatadata Provider to take ownership of their reports, and as such given them permission to add/edit/delete their reports.

When a Metadata Report is authored directly against a Metadataflow, the Agency who owns the Report is the Agency that authored the Report. It is therefore possible for the BIS to author a Metadata Report against a Metadataflow owned by the IMF.

Security rules can be defined in Fusion Registry to restrict Metadata Report read/write rules.

Reporting Reference Metadata

Reference Metadata must conform to the Metadata Structure Definition, it is authored in SDMX format, currently only SDMX-JSON is supported.

The Report has the following pieces of information:

  1. The identity. The owner, id, and version of the report, used for unique identification.
  2. The name/description.
  3. The target(s). This defines what structure or structures the report is for.
  4. The content. The content of the report is defined in Metadata Attributes. Each Metadata Attribute corresponds to a Concept in the Metadata Structure Definition.

A SDMX-JSON example follows:

{
 "meta": {
   "id": "IREF920760",
   "test": false,
   "schema": "https://raw.githubusercontent.com/sdmx-twg/sdmx-json/master/metadata-message/tools/schemas/2.0.0/sdmx-json-metadata-schema.json",
   "prepared": "2022-07-21T08:27:55Z",
   "contentLanguages": ["en"],
   "sender": {"id": "FusionRegistry"}
 },
 "data": {
   "metadataSets": [
     {
       "id": "EXAMPLE",
       "names": {
         "en": "Example Report"
       },
       "version": "1.0.0",
       "agencyID": "IMF.UK1"
       "metadataflow": "urn:sdmx:org.sdmx.infomodel.metadatastructure.Metadataflow=IMF:MDF_DQAF(1.0.0)",
       "targets": [
         "urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=IMF:BOP(*)"
       ],
       "attributes": [
         {
           "id": "QUALITY",
           "attributes": [
             {
               "id": "LEGAL",
               "value": {
                 "en": "The responsibility for collecting, processing, and...."
               }
             },
             {
               "id": "RESOURCE",
               "value": {
                 "en": "Staff, facilities, computing resources, and financing"
               }
             },
             {
               "id": "RELEVANCE",
               "value": {
                 "en": "The relevance and practical utility of existing statistics"
               }
             }
           ]
         },
         {
           "id": "INTEGRITY",
           "attributes": [
             {
               "id": "INST",
               "value": {
                 "en": "Statistics are produced on an impartial basis"
               }
             },
             {
               "id": "TRANSPARENCY",
               "value": {
                 "en": "The terms and conditions under which statistics are collected, processed, and disseminated are available to the public."
               }
             },
             {
               "id": "ETHICAL",
               "value": {
                 "en": "Guidelines for staff behavior are in place and are well known to the staff"
               }
             }
           ]
         },