Fusion Data Mapper - Query Forwarding

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

Query Forwarding Example

Data Query forwarding is the process of Fusion Registry using mapping rules to rewrite a Data Query against the target Dataflow to conform the the structure of the Source Dataflow. For example a query for data where REF_AREA=UK may result in a query for 15 different series by unique identifiers UK1, UK2, UK3, and so on. The Fusion Registry is able to rewrite any multidimensional query to a list of series that correspond to the data filters, it then forward this query onto the data store (or data stores if there are multiple) that contain the data. When the Fusion Registry recieves the response, it is able rewrite the dataset into a multidimensional dataset, in the format requested by the user. For example extracting the time series UK1, UK2, UK3 from an Oracle database may result in series for FREQ=A, REF_AREA=UK, INDICATOR=EMP (along with other series)being written in CSV format to the client. When the source data is updated the client immediately sees the updated data, as it is always retrieved from the source data.

Source Data Store

In order to use query forwarding, the source Dataflow must be linked to a data store which contains the data for the source Dataflow. The data store can be any that are supported by Fusion Registy, including:

  • Registry Managed MySQL or SQL Server Database
  • Externally Managed MySQL, SQL Server or Oracle Database
  • SDMX Compliant REST web service
  • SDMX File

The Fusion Data Mapper will link the Provision Agreement for the target Dataflow to the Mapping as it's source of Data. This can be performed at any time manually under the Registry Data Sources manager page.

Mappedstore.png

Single Data Source serving data for Multiple Dataflows

If the Fusion Registry is linked to the data store for the source Dataflow (single Dimension Dataflow) then it is able to forward queries from the mapped Dataflow to the data store and map the response back out. It is possible to map a subset of the source Data to one Target Dataflow and another subset to a different Dataflow. In this way, it is possible to map many multidimensional Dataflows to the same source of data, where each Dataflow is only representing a small fraction of the total dataset.

For this use case caution must be taken against querying for all data against a Target Dataflow, as the mapped query is a query for all data for the Source Dataflow, resulting in full data extraction from the Source. The correct subset of data will be written out, because the series that have not been mapped will be discarded, however a query for all data may not be desirable if the Source Dataflow has a lot of data loaded against it. In order to protect against this behaviour the Fusion Registry provides a setting under the administration configurations for Mapping, the setting is called Explicit Data Query Conversion. Explicit Data Query Conversion is defaulted to off, this means a query for all data against the target Dataflow is sent to the source Dataflow as a query for all data. When this setting is set to On a query for all data will be explicitly mapped into a list of series identifiers that match this query, for example if 500 series are mapped to the Target then a query for all data results in a list of 500 series identifiers being queried. Protection is in place in the Fusion Registry to split large queries up into smaller batches as some databases restrict the number of parameters passed to the IN statement - so one web query may result in more then one SQL statement being executed to fulfil the query.

Datastore Caching

The Fusion Registry can make use of a mapped data store, but instead of mapping the query and response for each request, the contents of the data store can be read into the in-memory Fusion Data Store when the mapping is created, or updated. This means the data query will no longer be executed against the source database, but instead the in-memory store. The data store can be re-indexed when required through the User Interface, or re-index web service to refresh the cache. The cache is also refreshed when the mapping is updated (i.e new mappings added, or existing mappings modified). The re-index process queries for the contents of the source database, maps the data, and writes the data to memory. This is a setting in the Fusion Registry Mapping configuration pages which only the FusioN Registry Administrator has access to.