Difference between revisions of "Anatomy of the Data Browser"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Series List)
(Series List From Series Basket)
Line 83: Line 83:
  
 
== Series List From Series Basket ==
 
== Series List From Series Basket ==
 +
 +
The Series Basket enables a user to hand pick one or more series, to save these as a basket which can be reloaded at a later date.  Furthermore, the basket can contain series from different Dataflows. 
 +
 +
Whilst some series baskets can be resolved using the REST URL to GET the data, this is not useful in all situations. Instead a JSON query can be POSTED to the web service.  The POST contains the details about the Dataflow, Series List, and other parameters are included such as lastNObservations.
 +
 +
{
 +
  "queries": {
 +
    "dataflow": {
 +
      "id": "GCI",
 +
      "agencyId": "WB",
 +
      "version": "1.0"
 +
    },
 +
  "lastNObservations": 1
 +
  },
 +
  "series": ["GHA:EOSQ071:RANK:A","GHA:EOSQ071:ONE_SEVEN_BEST:A","AUS:EOSQ071:RANK:A"]
 +
}
  
 
= Quick Filter =
 
= Quick Filter =

Revision as of 11:16, 3 May 2020

Overview

The Fusion Data Browser is simply a User Interface on top of Fusion Registry / Fusion Edge Server web services. When a user clicks on a link, it calls a web service and renders the response. It makes use of a great third party charting libaray, AmCharts to render the charts, all other code is built in house written in written in TypeScript) which transpiles into JavaScript.

This page takes your through the process and web service calls which are used to build the Fusion Data Browser, if you want to build your own Data Browser you can use the knowledge of what is possible to help.


Dataset Navigation Menu

Databrowser datasets navbar.png

The dataset navigation menu, shown on the left pane of the data browser, was arguably the most complex part of the data browser to build. The Dataset Navigation menu provides a breakdown of datasets by Dimension Id, for example Datasets by COUNTRY. In order to answer this question we need knowledge of:

  • What Data Structure Definitions (DSDs) support a COUNTRY Dimension
  • What Codelists are used for each of the above DSDs as they may not be the same
  • What Codes in the Codelist have data, and for which DSD

The second level of complexity is that the Dataset Navigation component supports a hierarchy of breakdowns, so I can have COUNTRY followed by AGE – the sidebar will then show all the countries that have data, when a user expands a country it will show all the age ranges that have data for the selected country, when an age range is expanded, the user is shown all the Datasets that have data for the selected Country and Age range.

To solve this problem, the sidebar needs access to the following structural metadata:

  • All Dataflows
  • Data Structures for the Dataflows
  • Concepts for the Datastructures
  • Codelists

However, in order to minimise response side, we only want Dataflows that have Data, and we only want Codelists with the codes in that are known to have data. Furthermore, we need to know, for each Code, which dataset it has data for.

This information is not possible from the structure API, so we turned to the availableconstraint API https://github.com/sdmx-twg/sdmx-rest/wiki/Data-Availability, whose job it is to provide all the above information for a Single Dataflow. This was a problem as we needed the information across all dataflows, so in this instance we extended the availableconstraint API to support this question for All Dataflows.

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all/all?references=descendants&mode=available

We now have enough information to answer all the questions required to build the first level of the sidebar, for example breakdown by Country.

We do not have enough information to build the second part of the hierarchy if there is a second Dimension chosen in the breakdown, this is because whilst we know that a particular Age Range has data for a particular Dataset, we do not know if it has data in the context of the selected Country. It is not efficient to get this information up front, as the more Dimensions in the hierarchy, the number of permutations become too high to realistically solve efficiently. So instead, the information on what comes next is acquired at the time the information is needed (when the user expands the hierarchy). It does this by issuing the query:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all,all,all/;CURRENCY=ARS/all/FREQ?&mode=exact

Again, this is an extension of the availableconstraint query, this time a question is given across all Dataflows, but the question fixes CURRENCY to ARS using matrix parameters, and we only care about the response in the context of the FREQ Dimension. In addition the mode is set to ‘exact’ what this translates to is:

“Tell me what Code Ids have data for the FREQUENCY Dimension across all Dataflows when CURRENCY is ARS”

As this URL does not request descendants, the response this time only contains Code Ids, no other supporting metadata (Dataflows, DSDs, Codelists). This is enough information for us to build the next level of the hierarchy, as we already have the supporting metadata from the first request, in order to know what the corresponding code labels are.

Free Text Search

Search.png

The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API. There are 2 aspects to the search

  1. AutoComplete
  2. Search

The Autocomplete is achieved using the datasearch API with query=[term] and setting auto=true

https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?auto=true&query=peso

To execute the query, simply set auto to false, or exclude it from the URL

https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?query=peso

The search API returns the hits broken down by Dataflow, and then if applicable, by the Classifications that matched grouped by the Dimension or Attribute to which they belong. So is very easy to render.

Series List

Databrowser serieslist items.png

The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied.

Series List From Basic Query Filters

So for example for Exchange Rate data -> Argentine Pesto -> Annual Frequency, the URL query would be the following:

data/ECB,EXR,1.0/A.ARS....

However, as we only want to show the series information, we do not need every observations for each the series. We do want to display the last observation date and observation value, so we will ask only for the last observation to be included:

data/ECB,EXR,1.0/A.ARS....?lastNObservations=1

This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested:

data/ECB,EXR,1.0/all?lastNObservations=1&max=200  

This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back.

Whilst the Fusion Data Browser does not include pagination, this could have been achieved using the offset parameter:

data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200

Series List From Search Result

The series list from a search result uses the same technique as the basic query filters, as the search result provides the Dataflow identifier, and if there is a 'hit' for a classification, then the Dimension and Code Id is also provided. This enables the query to be built using exactly the URL with Dataflow and query filter.

Series List From Series Basket

The Series Basket enables a user to hand pick one or more series, to save these as a basket which can be reloaded at a later date. Furthermore, the basket can contain series from different Dataflows.

Whilst some series baskets can be resolved using the REST URL to GET the data, this is not useful in all situations. Instead a JSON query can be POSTED to the web service. The POST contains the details about the Dataflow, Series List, and other parameters are included such as lastNObservations.

{
 "queries": {
   "dataflow": {
     "id": "GCI",
     "agencyId": "WB",
     "version": "1.0"
   },
 "lastNObservations": 1
 },
 "series": ["GHA:EOSQ071:RANK:A","GHA:EOSQ071:ONE_SEVEN_BEST:A","AUS:EOSQ071:RANK:A"]
}

Quick Filter

Advanced Filter

Series Details

Series Sparkline

Series Basket

Export Data

Export/Chart Selected Series

Pivot Table

Charts

Slice and Dice Chart Data

Embed URL

Save Series

Save Chart