Revision as of 10:06, 3 November 2020

Overview

To Validate Data you need to have the following structures in place.

Preparation

Data Provider

A Data Provider is an Organisation Type. When a Provision Agreement is created a Dataflow and a DataProvider must be present. An example Data Provider is shown below.

Provision Agreement

A Provision Agreement is the union of a Dataflow with a Data Provider. A Provision Agreement (PA) is a definition that the Data Provider is allowed to provide data for the Dataflow. Data is always reported by a Data Provider against the PA. You can read more about Provision Agreements in this article. An example Provision Agreement is shown below.

Dataflow

A Dataflow is a structure on which data is collected and disseminated. A Dataflow references a Data Structure Definition (DSD) which is used as the underlying template to which the data must conform. You can read more about Dataflows in this article. An example Dataflow is shown below.

Load Data

Once all the elements are in place as described above, the next step is to load the data which is done via the Convert option on the Data Menu.

Data can be loaded from a file of via a URL (for example from the Metadata Technology's Fusion Registry Demo site).

To successfully validate, the data must adhere to the SDMX standard in terms of format as well as what has been defined in the the Data Structure.

Supported formats are:

SDMX_2.1-Generic
SDMX-V2.0-Compact
SDMX-EDI
SDMX-JSON
SDMX-V2.0-Generic

To see this process in action you can watch this video

Validate Data

Click Load Data to start the validation process as explained in the image below.

Validation Scheme

What is a Validation Scheme?

Validation Schemes define one or more validation rules which can be executed against a Dataflow at the data validation stage of a data load. Each validation rule consists of a mathematical expression or a link to an aggregation hierarchy which is used to create an expression. This validation goes beyond syntactical and semantical validation of the dataset and is instead checking that the values supplied in the dataset conform to specific business rules. Examples of this could be that a particular field must have a value less than 100, or that the total value reported must be the same as the total of a set of other observation values.

A Validation Scheme must be assigned against a single Dataflow and may consist of one or many validation rules. A single Validation Rule consists of:

an ID and name
an optional description
a type: either a custom expression or a hierarchic expression
a result type (either numerical or code type) and a value (e.g. 100 or GDP)
an equality operator ( one of the following mathematical operators: =, <>, <, <=, >, >= )
an expression (e.g. [EUR]+[FR] )

The Validation Scheme rules will be applicable to all datasets submitted against the Dataflow the Validation Scheme is linked to.

How are Rules Applied

A validation rule operates on a single dimension, an example of a rule to calculate Total from the inputs Males and Females would look like the following:

[T] = [M] + [F]

Note: the syntax used in a validation scheme puts code Ids into square brackets.

This rule would be applied to every series where all other parts of the series key match, so the following series there would be two matches to this rule, one for employment, and one for unemployment.

For a validation rule to be executed there must be data reported for the output, and at least one of the inputs. If data are missing in the inputs, then they are treated a zero values. In the following example, only 1 rule is matched, and there is only one input (Male).

There are two types of validation rules, ones which use a custom written expression, as described above. The second type references a Hierarchy in the Registry, and the Hierarchy is used as the basis for an Aggregation expression. For example the following image shows a hierarchy of countries, against theoretical reported values. This is an example of a hierarchy being used to validate a dataset. A hierarchy can be applied to any dimension that uses the same Codelist as the Hierarchy. When values are read in the data file, the totals at each sub-hierarchy are summed up to ensure they are consistent with the parent value. If any values are missing data, they are treated as having a value of zero.

Note: the Registry only checks data in the submitted file, and does not cross check against any persisted data when validating. For example if you have already stored the totals in a Registry database, submitting a Dataset containing the values making up the totals, the Registry will not validate from the file against the totals already stored.

Validation Scheme Tutorial

This tutorial describes the manual steps in the process to create a Validation Scheme. It is required that your Registry be populated with structures that support this process (such as Data Structure Definitions and Dataflows).

Overview

This section will explain the creation of a simple Validation Scheme that demonstrates when Data is loaded into the Registry that the rules within the scheme will be used for validation purposes.

Creating the Validation Scheme from the User Interface

A Validation Scheme is created or maintained by using the authoring Wizard. The cogs icon, used to open the wizard is only available to authenticated Agency or Admin users.

@@ Line 108: / Line 108: @@
 '''Note''': the Registry only checks data in the submitted file, and does not cross check against any persisted data when validating.  For example if you have already stored the totals in a Registry database, submitting a Dataset containing the values making up the totals, the Registry will not validate from the file against the totals already stored.
-=Validation Scheme Tutorial==
+=Validation Scheme Tutorial=
 This tutorial describes the manual steps in the process to create a Validation Scheme. It is required that your Registry be populated with structures that support this process (such as Data Structure Definitions and Dataflows).
+==Overview==
+This section will explain the creation of a simple Validation Scheme that demonstrates when Data is loaded into the Registry that the rules within the scheme will be used for validation purposes.
+==Creating the Validation Scheme from the User Interface==
+A Validation Scheme is created or maintained by using the authoring Wizard.  The cogs icon, used to open the wizard is only available to authenticated Agency or Admin users.

Difference between revisions of "Validate data"

Revision as of 10:06, 3 November 2020

Contents

Overview

Preparation

Data Provider

Provision Agreement

Dataflow

Load Data

Validate Data

Validation Scheme

What is a Validation Scheme?

How are Rules Applied

Validation Scheme Tutorial

Overview

Creating the Validation Scheme from the User Interface

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Fusion Software

Metadata Technology

Tools