Difference between revisions of "Validate data"

From Fusion Registry Wiki
Jump to navigation Jump to search
(About this Tutorial)
(Validation Scheme Tutorial=)
Line 108: Line 108:
 
'''Note''': the Registry only checks data in the submitted file, and does not cross check against any persisted data when validating.  For example if you have already stored the totals in a Registry database, submitting a Dataset containing the values making up the totals, the Registry will not validate from the file against the totals already stored.
 
'''Note''': the Registry only checks data in the submitted file, and does not cross check against any persisted data when validating.  For example if you have already stored the totals in a Registry database, submitting a Dataset containing the values making up the totals, the Registry will not validate from the file against the totals already stored.
  
=Validation Scheme Tutorial==
+
=Validation Scheme Tutorial=
 
This tutorial describes the manual steps in the process to create a Validation Scheme. It is required that your Registry be populated with structures that support this process (such as Data Structure Definitions and Dataflows).
 
This tutorial describes the manual steps in the process to create a Validation Scheme. It is required that your Registry be populated with structures that support this process (such as Data Structure Definitions and Dataflows).
 +
 +
==Overview==
 +
This section will explain the creation of a simple Validation Scheme that demonstrates when Data is loaded into the Registry that the rules within the scheme will be used for validation purposes.
 +
 +
==Creating the Validation Scheme from the User Interface==
 +
A Validation Scheme is created or maintained by using the authoring Wizard.  The cogs icon, used to open the wizard is only available to authenticated Agency or Admin users.

Revision as of 10:06, 3 November 2020

Overview

To Validate Data you need to have the following structures in place.

Preparation

PA.PNG

Data Provider

A Data Provider is an Organisation Type. When a Provision Agreement is created a Dataflow and a DataProvider must be present. An example Data Provider is shown below.


Val1.PNG

Provision Agreement

A Provision Agreement is the union of a Dataflow with a Data Provider. A Provision Agreement (PA) is a definition that the Data Provider is allowed to provide data for the Dataflow. Data is always reported by a Data Provider against the PA. You can read more about Provision Agreements in this article. An example Provision Agreement is shown below.


Val2.PNG

Dataflow

A Dataflow is a structure on which data is collected and disseminated. A Dataflow references a Data Structure Definition (DSD) which is used as the underlying template to which the data must conform. You can read more about Dataflows in this article. An example Dataflow is shown below.


Val3.PNG

Val4.PNG

Load Data

Once all the elements are in place as described above, the next step is to load the data which is done via the Convert option on the Data Menu.

Data can be loaded from a file of via a URL (for example from the Metadata Technology's Fusion Registry Demo site).

To successfully validate, the data must adhere to the SDMX standard in terms of format as well as what has been defined in the the Data Structure.

You can read more about Data Structures in this article.

You can read more about how to create a simple Data Structures in this article.

Supported formats are:

  • SDMX_2.1-Generic
  • SDMX-V2.0-Compact
  • SDMX-EDI
  • SDMX-JSON
  • SDMX-V2.0-Generic

Val5.PNG

To see this process in action you can watch this video




Validate Data

Click Load Data to start the validation process as explained in the image below.


Val6.PNG

Validation Scheme

What is a Validation Scheme?

Validation Schemes define one or more validation rules which can be executed against a Dataflow at the data validation stage of a data load. Each validation rule consists of a mathematical expression or a link to an aggregation hierarchy which is used to create an expression. This validation goes beyond syntactical and semantical validation of the dataset and is instead checking that the values supplied in the dataset conform to specific business rules. Examples of this could be that a particular field must have a value less than 100, or that the total value reported must be the same as the total of a set of other observation values.


Valid1.PNG

A Validation Scheme must be assigned against a single Dataflow and may consist of one or many validation rules. A single Validation Rule consists of:

  • an ID and name
  • an optional description
  • a type: either a custom expression or a hierarchic expression
  • a result type (either numerical or code type) and a value (e.g. 100 or GDP)
  • an equality operator ( one of the following mathematical operators: =, <>, <, <=, >, >= )
  • an expression (e.g. [EUR]+[FR] )

The Validation Scheme rules will be applicable to all datasets submitted against the Dataflow the Validation Scheme is linked to.

How are Rules Applied

A validation rule operates on a single dimension, an example of a rule to calculate Total from the inputs Males and Females would look like the following:

[T] = [M] + [F]

Note: the syntax used in a validation scheme puts code Ids into square brackets.

This rule would be applied to every series where all other parts of the series key match, so the following series there would be two matches to this rule, one for employment, and one for unemployment.


Valid2.PNG

For a validation rule to be executed there must be data reported for the output, and at least one of the inputs. If data are missing in the inputs, then they are treated a zero values. In the following example, only 1 rule is matched, and there is only one input (Male).


Valid3.PNG

There are two types of validation rules, ones which use a custom written expression, as described above. The second type references a Hierarchy in the Registry, and the Hierarchy is used as the basis for an Aggregation expression. For example the following image shows a hierarchy of countries, against theoretical reported values. This is an example of a hierarchy being used to validate a dataset. A hierarchy can be applied to any dimension that uses the same Codelist as the Hierarchy. When values are read in the data file, the totals at each sub-hierarchy are summed up to ensure they are consistent with the parent value. If any values are missing data, they are treated as having a value of zero.


Valid4.PNG

Note: the Registry only checks data in the submitted file, and does not cross check against any persisted data when validating. For example if you have already stored the totals in a Registry database, submitting a Dataset containing the values making up the totals, the Registry will not validate from the file against the totals already stored.

Validation Scheme Tutorial

This tutorial describes the manual steps in the process to create a Validation Scheme. It is required that your Registry be populated with structures that support this process (such as Data Structure Definitions and Dataflows).

Overview

This section will explain the creation of a simple Validation Scheme that demonstrates when Data is loaded into the Registry that the rules within the scheme will be used for validation purposes.

Creating the Validation Scheme from the User Interface

A Validation Scheme is created or maintained by using the authoring Wizard. The cogs icon, used to open the wizard is only available to authenticated Agency or Admin users.