Difference between revisions of "Anatomy of the Data Browser"
(→Free Text Search) |
(→Series List) |
||
Line 59: | Line 59: | ||
The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied. | The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied. | ||
− | So for example for | + | So for example for Exchange Rate data -> Argentine Pesto -> Annual Frequency, the URL query would be the following: |
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS.... | https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS.... | ||
− | However, we only want to show the series information, not | + | However, as we only want to show the series information, we do not need every observations for each the series. We do want to display the last observation date and observation value, so we will ask only for the last observation to be included: |
− | https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....?lastNObservations=1 | + | https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....?'''lastNObservations=1''' |
This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested: | This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested: | ||
Line 71: | Line 71: | ||
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200 | https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200 | ||
− | This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back. | + | This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back. |
+ | |||
+ | Whilst the Fusion Data Browser does not include pagination, this could have been achieved using the '''offset''' parameter: | ||
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200 | https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200 |
Revision as of 10:01, 3 May 2020
Contents
Overview
The Fusion Data Browser is simply a User Interface on top of Fusion Registry / Fusion Edge Server web services. When a user clicks on a link, it calls a web service and renders the response. It makes use of a great third party charting libaray, AmCharts to render the charts, all other code is built in house written in written in TypeScript) which transpiles into JavaScript.
This page takes your through the process and web service calls which are used to build the Fusion Data Browser, if you want to build your own Data Browser you can use the knowledge of what is possible to help.
The dataset navigation menu, shown on the left pane of the data browser, was arguably the most complex part of the data browser to build. The Dataset Navigation menu provides a breakdown of datasets by Dimension Id, for example Datasets by COUNTRY. In order to answer this question we need knowledge of:
- What Data Structure Definitions (DSDs) support a COUNTRY Dimension
- What Codelists are used for each of the above DSDs as they may not be the same
- What Codes in the Codelist have data, and for which DSD
The second level of complexity is that the Dataset Navigation component supports a hierarchy of breakdowns, so I can have COUNTRY followed by AGE – the sidebar will then show all the countries that have data, when a user expands a country it will show all the age ranges that have data for the selected country, when an age range is expanded, the user is shown all the Datasets that have data for the selected Country and Age range.
To solve this problem, the sidebar needs access to the following structural metadata:
- All Dataflows
- Data Structures for the Dataflows
- Concepts for the Datastructures
- Codelists
However, in order to minimise response side, we only want Dataflows that have Data, and we only want Codelists with the codes in that are known to have data. Furthermore, we need to know, for each Code, which dataset it has data for.
This information is not possible from the structure API, so we turned to the availableconstraint API https://github.com/sdmx-twg/sdmx-rest/wiki/Data-Availability, whose job it is to provide all the above information for a Single Dataflow. This was a problem as we needed the information across all dataflows, so in this instance we extended the availableconstraint API to support this question for All Dataflows.
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all/all?references=descendants&mode=available
We now have enough information to answer all the questions required to build the first level of the sidebar, for example breakdown by Country.
We do not have enough information to build the second part of the hierarchy if there is a second Dimension chosen in the breakdown, this is because whilst we know that a particular Age Range has data for a particular Dataset, we do not know if it has data in the context of the selected Country. It is not efficient to get this information up front, as the more Dimensions in the hierarchy, the number of permutations become too high to realistically solve efficiently. So instead, the information on what comes next is acquired at the time the information is needed (when the user expands the hierarchy). It does this by issuing the query:
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/availableconstraint/all,all,all/;CURRENCY=ARS/all/FREQ?&mode=exact
Again, this is an extension of the availableconstraint query, this time a question is given across all Dataflows, but the question fixes CURRENCY to ARS using matrix parameters, and we only care about the response in the context of the FREQ Dimension. In addition the mode is set to ‘exact’ what this translates to is:
“Tell me what Code Ids have data for the FREQUENCY Dimension across all Dataflows when CURRENCY is ARS”
As this URL does not request descendants, the response this time only contains Code Ids, no other supporting metadata (Dataflows, DSDs, Codelists). This is enough information for us to build the next level of the hierarchy, as we already have the supporting metadata from the first request, in order to know what the corresponding code labels are.
Free Text Search
The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API. There are 2 aspects to the search
- AutoComplete
- Search
The Autocomplete is achieved using the datasearch API with query=[term] and setting auto=true
https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?auto=true&query=peso
To execute the query, simply set auto to false, or exclude it from the URL
https://demo.metadatatechnology.com/FusionRegistry/ws/public/datasearch?query=peso
The search API returns the hits broken down by Dataflow, and then if applicable, by the Classifications that matched grouped by the Dimension or Attribute to which they belong. So is very easy to render.
Series List
The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied.
So for example for Exchange Rate data -> Argentine Pesto -> Annual Frequency, the URL query would be the following:
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....
However, as we only want to show the series information, we do not need every observations for each the series. We do want to display the last observation date and observation value, so we will ask only for the last observation to be included:
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/A.ARS....?lastNObservations=1
This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested:
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200
This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back.
Whilst the Fusion Data Browser does not include pagination, this could have been achieved using the offset parameter:
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200