Difference between revisions of "Anatomy of the Data Browser"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Series List)
(Free Text Search)
Line 40: Line 40:
  
 
= Free Text Search =
 
= Free Text Search =
[[File:Search.png]]
+
[[File:Search.png|thumb]]
  
 
The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API.  There are 2 aspects to the search
 
The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API.  There are 2 aspects to the search

Revision as of 02:45, 3 May 2020

Overview

The Fusion Data Browser is simply a User Interface on top of Fusion Registry / Fusion Edge Server web services. When a user clicks on a link, it calls a web service and renders the response. It makes use of a great third party charting libaray, AmCharts to render the charts, all other code is built in house written in written in TypeScript) which transpiles into JavaScript.

This page takes your through the process and web service calls which are used to build the Fusion Data Browser, if you want to build your own Data Browser you can use the knowledge of what is possible to help.


Dataset Navigation Menu

Databrowser datasets navbar.png

The dataset navigation menu, shown on the left pane of the data browser, was arguably the most complex part of the data browser to build. The Dataset Navigation menu provides a breakdown of datasets by Dimension Id, for example Datasets by COUNTRY. In order to answer this question we need knowledge of:

  • What Data Structure Definitions (DSDs) support a COUNTRY Dimension
  • What Codelists are used for each of the above DSDs as they may not be the same
  • What Codes in the Codelist have data, and for which DSD

The second level of complexity is that the Dataset Navigation component supports a hierarchy of breakdowns, so I can have COUNTRY followed by AGE – the sidebar will then show all the countries that have data, when a user expands a country it will show all the age ranges that have data for the selected country, when an age range is expanded, the user is shown all the Datasets that have data for the selected Country and Age range.

To solve this problem, the sidebar needs access to the following structural metadata:

  • All Dataflows
  • Data Structures for the Dataflows
  • Concepts for the Datastructures
  • Codelists

However, in order to minimise response side, we only want Dataflows that have Data, and we only want Codelists with the codes in that are known to have data. Furthermore, we need to know, for each Code, which dataset it has data for.

This information is not possible from the structure API, so we turned to the availableconstraint API https://github.com/sdmx-twg/sdmx-rest/wiki/Data-Availability, whose job it is to provide all the above information for a Single Dataflow. This was a problem as we needed the information across all dataflows, so in this instance we extended the availableconstraint API to support this question for All Dataflows.

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all/all?references=descendants&mode=available

We now have enough information to answer all the questions required to build the first level of the sidebar, for example breakdown by Country.

We do not have enough information to build the second part of the hierarchy if there is a second Dimension chosen in the breakdown, this is because whilst we know that a particular Age Range has data for a particular Dataset, we do not know if it has data in the context of the selected Country. It is not efficient to get this information up front, as the more Dimensions in the hierarchy, the number of permutations become too high to realistically solve efficiently. So instead, the information on what comes next is acquired at the time the information is needed (when the user expands the hierarchy). It does this by issuing the query:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all,all,all/;CURRENCY=ARS/all/FREQ?&mode=exact

Again, this is an extension of the availableconstraint query, this time a question is given across all Dataflows, but the question fixes CURRENCY to ARS using matrix parameters, and we only care about the response in the context of the FREQ Dimension. In addition the mode is set to ‘exact’ what this translates to is:

“Tell me what Code Ids have data for the FREQUENCY Dimension across all Dataflows when CURRENCY is ARS”

As this URL does not request descendants, the response this time only contains Code Ids, no other supporting metadata (Dataflows, DSDs, Codelists). This is enough information for us to build the next level of the hierarchy, as we already have the supporting metadata from the first request, in order to know what the corresponding code labels are.

Free Text Search

Search.png

The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API. There are 2 aspects to the search

  1. AutoComplete
  2. Search

The Autocomplete is achieved using the datasearch API with query=[term] and setting auto=true

https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?auto=true&query=peso

To execute the query, simply set auto to false, or exclude it from the URL

https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?query=peso

The search API returns the hits broken down by Dataflow, and then if applicable, by the Classifications that matched grouped by the Dimension or Attribute to which they belong. So is very easy to render.

Series List

Databrowser serieslist items.png

The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied.

So for example for Exhange Ratest, Aunnual Data, for the Argentine Pesto, the query is the following:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....

However, we only want to show the series information, not the full set of observations for the series, so we can impose futher restrictions. We do display the last observation date and value, so the query becomes:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....?lastNObservations=1

This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200  

This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back. If we wanted to include paganation an offset could have been provided:

https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200

Quick Filter

Advanced Filter

Series Details

Series Sparkline

Series Basket

Export Data

Export/Chart Selected Series

Pivot Table

Charts

Slice and Dice Chart Data

Embed URL

Save Series

Save Chart