Pull data from third party REST API

From Fusion Registry Wiki
Jump to navigation Jump to search

Overview

If an organisation is hosting a SDMX REST API, then it is possible to pull structural metadata and data from this service into your local Fusion Registry instance via a number of web service calls. There are two ways this can be achieved. The simplest way is to use the Fusion Registry data portal feature (in Beta). Another approach is to use the Fusion Registry load structures/data pages and insert the appropriate REST URL. The benefit of the data portal is that the Fusion Registry does all of the work for you, it handles errors and merging of structures, it will even split the data query into multiple parts if the end service does not support large queries. This how to guide discusses both approaches.

Example REST API

For the purpose of this 'how to' the web service of the European Central Bank will be used, the API can be found by Googling 'ECB SDMX REST API' which at the time of writing links to this page:

The important thing to look for in a REST API is the entry point. The URL is the constructed by taking a base URL and postfixing the query parameters to the end, for data queries this starts with the word 'data' and for structure queries it starts with the structure type being queried, for example 'codelist'. On the ECB REST API page, example queries are provided which look like this:

http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/nama_10_gdp/.CLV10_MEUR.B1GQ.BE/?startperiod=2005&endPeriod=2011

The important part of the above URL is everything before the path /data/, this is the entry point to their web service, which is as follows:

http://ec.europa.eu/eurostat/SDMX/diss-web/rest/

We can use this URL to construct queries for both data and structures, for example

http://ec.europa.eu/eurostat/SDMX/diss-web/rest/codelist = query for all codelists

Pull Data using Data Portal Feature

The Fusion Registry has a Beta feature called the Data Portal. To access this feature, log in as an Admin user and navigate to the Fusion Data Portal which is available at the following URL:

https://your.server/FusionRegistry/admin/portal.html

Click the 'Add Service' button and complete the form, which must include the REST entry point, service name, and if approprite the username and password for the external service. A username and password should only be provided if the external service requires authentication and supports HTTP Basic Authentication, if this is not the case then leave this blank.

The portal can be set up to synchronise all datasets, specific datasets, or structures only. If specific datasets is selected, the choice of which datasets is performed after saving the configuration.

A cron expression can be used to automatically run a synchroisation task on a repeated schedule. There are many online cron builders which can be used to help build a valid expression, for example Free Formatter which can be used to both build and check an expression.

External service form

The list of where to store data has a minimum of 2 options:

  1. Local - Fusion Store, this is the purpose built in-memory store for SDMX data.
  2. External REST Web Service. This does not store the data locally, instead the data is fetched each time a user queries the Fusion Registry for the data

If any Fusion Registry managed data stores have been added, then these will also appear in the list.

Note: as this is currently in Beta, the Fusion Store is the best choice. However, please note the memory requirements of using this store The following video demonstrates the memory required for the Fusion Store. Generally it is reccomended to only pull the datasets that are required, as the Fusion Registry will require memory to index each dataset, even if it is not held completly in memory.

When the portal service is added, it is possible to click the 'Select Datasets' button (if the specific datasets option was selected in the set up). The select datasets option will query the service for the available Dataflows, allowing any number to be selected.

Finally, to run the synchorisation select 'Sync Service'. The synchorisation will run in the background, updating this page as each dataset is imported. The first sync may take some time as all of the required structural metadata is pulled from the external service and stored locally.