Fusion Data Mapper - Maintaining Mapping Rules

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

The Fusion Data Mapper displays the Structure Mapping rules which are ultimately maintained and retrieved from the Fusion Registry. The Data Mapper provides the ability to load in rules from Excel or CSV files, edit or delete the rules locally and save them to the Fusion Registry server.

Importing mappings from Excel

Mapping rules can be expressed in an excel file by creating a column which has the same Id as the ID of the Source Dimension, and subsequent columns with Ids that match those of the Target Dimensions or Series Attributes.

Mappingrules2.png

The order of the columns is not important, as long as the column headings match the Ids of the Dimensions. If the target Dataflow wants to include the Series Code, perhaps as a Series Attribute, then ensure the Series Attribute Id in the target is the same as the Id in the source, and no further mapping is required, the Series Code will be exposed via the target Dataflow. The Source Dataflow can contain other Series Attributes for example LAST_UPDATED to indicate when the series was last updated. If the Target Dataflow has the same Series Attributes the data will be automatically mapped across without changing the reported value.

The quickest way to create an Excel file is to export the empty table in Excel format from the Data Mapper - this provides a blank template for creating new rules.

Import Action: Upsert

Importing mapping rules against the upsert action results in new mappings being created, where they did not previously exist, and existing mappings being overwritten. As the Source Dimension can only map a unique identification ONCE to a set of target Dimensions, the same unique identifier mapped to other Dimension values will result in an overwrite behaviour.

Upsert will not result in any mappings being removed, only addition of new rules or modification of existing.

Import Action: Full Replace

Full Replace will replace all the rules that exist against the Target - Source Dataflow with the ones in the loaded file.

Note: Fusion Registry preserve copies of old rules should rollback be required. Rollback is not supported in the Data Mapper, so must be performed via the Fusion Registry User Interface.

Manual Editing via the UI

Mapping Rules which exist in the Fusion Registry are viewed as a table in the Fusion Data Mapper, in the same style as Excel (column, row). Columns can be sorted ascending or descending by clicking on the column name. Filters can be applied to the columns.

Single Value Changes

To manually Edit one or more values for a mapping, double click on a code id, to load a drop down list of valid values. If the target Dimension is not backed by a Codelist, then a free text input box will be displayed to manually enter a value. When a cell is edited, the row will be highlighted in Orange to notify the user that an unsaved change was made to that row. To save all changes back to the Fusion Registry server, click on the Save button, this will submit all changes to the Fusion Registry for validation and update.

Edit mapping rule.png

Local changes that have not been saved to the Fusion Registry server can be undone by clicking on the Undo button.

Filters

The series filter input box can be used to perform a filter on all series shown in the table, if any part of the series identifier contains the text in the filter it will be a match, the search is case insensitive. For example the series identifier XULADS will match the filter ad.

Column filters are applied in addition to any other filters.

All filters are cleared using the clear filter control.


Bulk Changes

Bulk Changes can be made by selecting one or more rows using the checkbox control on the row. When one or more rows are selected, click on the bulk change button next to the Column label for the column you want to change the value of. This will open a control allowing the selection of a single value which will be applied to all selected rows.

Bulk Deletes can be achieved using the same row selection checkboxes and clicking on the button to delete the selected series. This will update the mapping locally to remove the mapped series. A further Save is required to push these deletes to the server. Deleted rows and modified rows are be pushed to the Fusion Registry server in one save command.

The select all checkbox will select all the checkbox’s that match the current query filter, even if the rows are not all displayed on one page due to pagination.

A selected row remains selected until it is deselected. The selection will remain even if the row is no longer displayed due to the application of a filter or viewing a different page. Deselection can be achieved by clicking the same select checkbox, or via a select all followed by deselect all, or by navigating to a different Dataflow mapping.


Enumerated Source Dimension

If the Source Dimension uses a Codelist to enforce its allowable content, this is used by the Data Mapper to allow series identifiers to be quickly added into the mapping.

Addseries.png

The add series button appears as a Plus Symbol [+] in the button bar if the Source Dimension is coded. This is true even if the Dimension was no Coded at the time the mapping was made (i.e if the Source Dimension was Free text but was subsequently modified to be Coded, the button will appear). On clicking this button a form will open displaying all the codes in this codelist, which can be filtered using the text entry field. The filter will search for all Code Ids and Lables that include the text as a case insensitive search. Regular Expression searches are supported.

Selecting one or more codes followed by add, will add a mapping into the table, but will not include any mapped values, these require manual editing, or an export to excel to complete the form followed by an import.

Partially Mapped Series and Duplicate Mappings

A series identifier does not have to map to a value for each Dimension in the target when the mapping is being created. However this is strongly recommended in a production system to provide a mapped value for each Target Dimension. If a Mapped value does not exist the series will not map to the target dataset and will not be retrieved via a data query.

If the same combination of target Dimension values map to more then one series identifier, then there will create a duplicate mapping, this should be avoided as it will cause ambiguity in the mapping. For example if series A1 and B1 both map to FREQ=A, REF_AREA=UK, INDICATOR=EMP then a query for the series A:UK:EMP will result in data from two different time series being written against the same mutlidimensional key for the same time period. The result may be that only one series is written to the client, and there is no gurantee which one will be written. Therefore duplicate mappings should be avoided.

It is however possible to map the same series identifier against different Dataflows.

Modification Result

The result of modifying the mapping is the Fusion Registry server being updated with the new metadata. This may result in a re-index of the Source dataset if the Target Dataflow has been linked to the Source Dataflow for data retrieval. This will mean any data navigation or retrieval will reflect the changes of the mapping with immediate effect.

Deleting Mapping Rules

Mapping rules can be deleted by selecting and deleting rows from the Mapping table (followed by a Save to the server). Rules can also be deleted when an excel file is loaded with the action Full Replace. The entire mapping can also be deleted from the Fusion Registry. If a mapping is using the data query forwarding feature, then the deletion of the mapping will result in data queries returning no data for the Target Dataflow. The source dataset will remain unaffected.

Changing the Source or Target Data Structure Definition

The decoupled nature of the target dataset from the source data makes the task of 'playing with Dimensions and Attributes' in the very easy with low impact.

For example if a Target Data Structure Definition is created with 4 Dimensions and mapped to the Source for the purpose of data retrieval, but at a later date it is decided that one of the Dimensions should become an Attribute then this does not require a great deal of work to update all the mapped dataset (if data query forwarding is enabled).

The process of updating the data is:

  • Data Mapper: Export Mapping Rule as Excel
  • Data Mapper: Delete Mapping Rule for Dataflow (deletes mapping and data indexes for target Dataflow)
  • Fusion Registry: Make modifications to the Dimensions or Attributes of the Source or Target Data Structure
  • Data Mapper: Create Mapping Rule for Dataflow (recreates a new mapping)
  • Data Mapper: Import Mapping Rules from Excel (recreates new data index)

The last step may require new columns to be added to Excel if there were new Dimensions or Attributes or if the Ids of the Dimension/Attribute changed then this should be reflected in the column header.