Difference between revisions of "Anatomy of the Data Browser"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Web Service)
Line 1: Line 1:
 +
[[Category:Fusion Data Browser]]
 
= Overview =
 
= Overview =
 
The documents the techniques and solutions used to solve common data discovery required encountered when building the Fusion Data Browser.
 
The documents the techniques and solutions used to solve common data discovery required encountered when building the Fusion Data Browser.

Revision as of 06:58, 16 August 2023

Overview

The documents the techniques and solutions used to solve common data discovery required encountered when building the Fusion Data Browser.

The public Demo Data Browser can be used when following this document, the demo Data Browser uses the web services of the Demo Fusion Registry server to obtain strutureal metadata and data. The same functionality would be achieved if it were using the web services of the Fusion Edge Server.

Fusion Data Browser Technology

Fusion Data Browser is simply a User Interface built on top of web services exposed by Fusion Registry or Fusion Edge Server. These web services are primarily SDMX web services, in accordance with the specification. The SDMX web services used are Data Availability, Data, and to a lesser degree Structural Metadata.

Fusion Data Browser is entirely made up of client side technology (JavaScript, HTML, CSS, and Images), there is no backend server or special services required. All the functionality and dynamic nature of the Fusion Data Browser is served by the web services it uses. User interactions, such as exploring the series in a Dataset, building a time series chart, or pivot table, are satisfied by either a request to a web service URL or by using information cached from a previous request.

Fusion Data Browser is written in TypeScript using 2 core libraries developed by Metadata Technology; IMJS and SDMXUI. IMJS (Information Model JavaScript) provides a client side Object model of the SDMX Information Model, with APIs used to fetch metadata and data from SDMX services. SDMXUI provides User Interface widgets on top of the information model, such as series lists, navigation controls, and charting widgets. Fusion Data Browser makes use of other front end frameworks, specifically JQuery and BootStrap and a great party charting libaray called AmCharts. The TypeScript of Fusion Data Browser, IMJS, and SDMX UI transpiles into JavaScript when we run a distribution build.

This page is not a guide into TypeScript of IMJS/SDMX UI, however some SDMX UI examples are given to show how the functionality can be achieved using this framework. Web service links are provided which connect to our live Demo Server, the default representation is SDMX-ML (XML format). The Fusion Data Browser primarily requests information in fusion-json format.

Multilingual Support

The Fusion Data Browser offers multi-lingual support in both the browser lables, and the metadata labels which are obtained from the server. The client side labels (such as the Login text on the Login button) are simply obtaied from a properties file. The metadata labels are obtained from the SDMX metadata retrieved from the web services. This is achieved using the HTTP Accept-Language Header. As most web browsers will include this header on a request, the user should get the information back in the locale set on their web browser by default if the corresponding metadata contains their requested language. If it does not, then English is used as default, and if there is no English metadata another locale is selected (first come first serve).

The languages offered by the Fusion Data Browser are set in a properties file, however the following web service call to request which languages are available in the metadata:

ws/fusion/info/lang

Dataset Navigation Menu

Databrowser datasets navbar.png

Behaviour

The dataset navigation menu, shown on the left pane of the data browser, was arguably the most complex part of the data browser to build. The Dataset Navigation menu provides a breakdown of datasets by Dimension Id, for example Datasets by COUNTRY. In order to answer this question we need knowledge of:

  • What Data Structure Definitions (DSDs) support a COUNTRY Dimension
  • What Codelists are used for each of the above DSDs as they may not be the same
  • What Codes in the Codelist have data, and for which DSD

The second level of complexity is that the Dataset Navigation component supports a hierarchy of breakdowns, so I can have COUNTRY followed by AGE – the sidebar will then show all the countries that have data, when a user expands a country it will show all the age ranges that have data for the selected country, when an age range is expanded, the user is shown all the Datasets that have data for the selected Country and Age range.

To solve this problem, the sidebar needs access to the following structural metadata:

  • All Dataflows
  • Data Structures for the Dataflows
  • Concepts for the Datastructures
  • Codelists

However, in order to minimise response side, we only want Dataflows that have Data, and we only want Codelists with the codes in that are known to have data. Furthermore, we need to know, for each Code, which dataset it has data for.

Web Service

The required information is not possible from the structure API, instead we turn to the availableconstraint API, whose job it is to provide all the above information for a Single Dataflow. This was a problem as we needed the information across all dataflows, so in this instance we extended the availableconstraint API to support this question for All Dataflows.

availableconstraint/all/all?references=descendants&mode=available

The availableconstraint API returns a Content Constraint which tells us which Code Ids are valid for each Diemsion. As there is a Content Constraint per Dataflow, we now know for each Dataflow, evey Code that has data and which Dimension the data is relevent for. This is useful as some Dataaflows use the same Codelist for different Dimensions, Reporting Country and Counterpart Country may both use the same Country Codelist for example (but this does not mean the same code ids have data for both Dimensions).

The references=descendants query parameter is used as it provides the Data Browser with all the supporting metadata used to organise and provide human readable labels to the information from the Content Constraints. The supporting metadata includes the Dataflows, DSDs, Concepts, and Codelists.

We now have enough information to answer all the questions required to build the first level of the sidebar, for example breakdown by Country.

We do not have enough information to build the second part of the hierarchy if there is a second Dimension chosen in the breakdown, this is because whilst we know that a particular Age Range has data for a particular Dataset, we do not know if it has data in the context of the selected Country. It is not efficient to get this information up front, as the more Dimensions in the hierarchy, the number of permutations become too high to realistically solve efficiently. So instead, the information on what comes next is acquired at the time the information is needed (when the user expands the hierarchy). It does this by issuing the query:

availableconstraint/all,all,all/;CURRENCY=ARS/all/FREQ?&mode=exact

Again, this is an extension of the availableconstraint query, this time a question is given across all Dataflows, but the question fixes CURRENCY to ARS using matrix parameters, and we only care about the response in the context of the FREQ Dimension. In addition the mode is set to ‘exact’ what this translates to is:

"Tell me what Code Ids have data for the FREQUENCY Dimension across all Dataflows when CURRENCY is ARS"

As this URL does not request descendants, the response this time only contains Code Ids, no other supporting metadata (Dataflows, DSDs, Codelists). This is enough information for us to build the next level of the hierarchy, as we already have the supporting metadata from the first request, in order to know what the corresponding code labels are.

SDMXUI TypeScript

SDMX UI suppoorts the above use case with the DataflowContext object. On initialisation it queries for all the required metadata (Dataflows, DSDs, Codelists) which is cached and used for the remainder of the users data browsing experience.

DataflowContext.loadContext().then(async function (ctx: SDMXUI.DataflowContext): Promise<void> {
  //Context is loaded
  //Get folder structure used to build sidebar
  let folder: ISeriesCatalog = ctx.getSeriesCatalog(["COUNTRY", "AGE"]);  
}

Free Text Search

Search.png

Search By Dataflow and Code

The Free Text Seach feature is very easy to implement, as it simply makes use of the free text search API. There are 2 aspects to the search

  1. AutoComplete
  2. Search

The Autocomplete is achieved using the datasearch API with query=[term] and setting auto=true

public/datasearch?auto=true&query=peso

To execute the query, simply set auto to false, or exclude it from the URL

public/datasearch?query=peso

The search API returns the hits broken down by Dataflow, and then if applicable, by the Classifications that matched grouped by the Dimension or Attribute to which they belong. So is very easy to render.

Search By Series Code

There is a secondary feature, and API to support the feature, and this is to support searching series by seies code. Each series has a unique identifier, which is made up of the code ids for each Dimension of the Data Structure it belongs to. For example Frequency=Annual, Currency=Chinese yuan renminbi, etc, starts to build up a series code of A.CNY.[and so on]. The Fusion Data Browser shows each series code when a series detailsa are expanded in the Series List view. An example of a series code is A.CNY.EUR.SP00.E and this can be used in the following query:

ws/public/seriessearch?query=A.CNY.EUR.SP00.E

The series API allows keys to be wildcarded, as so:

ws/public/seriessearch?query=A..EUR.SP00.E

The returned list of series ids can be coupled with the information from the metadata already obtained to build the Dataset Navigation in order to render the results.

SDMX UI TypeScript

The Typescript makes use of the DataSearch widget with the configuration passing in the callback functions which are used to build the table of results.

 let config: SearchOptions = {
           id: "#searchInput",  //Id of the input element to make the search control
           dataSearchApi: Context.INSTANCE.apiList.DataSearchAPI,
           seriesSearchApi: Context.INSTANCE.apiList.SeriesSearchAPI,
           dataSearchResult: function (searchTerm: string, searchResult: DataSearchResults): void {
               
           },
           seriesSearchResult: async function (searchTerm: string, searchResult: SeriesSearchResults): Promise<void> {
              
           },
           type: DataSearch,
           width: 300,
           autocompleteTimeout: 5000,
           placeholder: "Search"
       };
UIWidgets.INSTANCE.dataSearch = UiWidget.build(config) as DataSearch;

To render the resules we can use the DataflowContext to lookup the metadata

let ctx: IDataflowContext = DataflowContext.INSTANCE;
let results = searchResult.results;
for (let currentResult of results) {
    let flow = ctx.getDataflow(currentResult.Dataflow);
    ...
}

Series List

Databrowser serieslist items.png

The primary goal of the Fusion Data Browser is to get the user to the point where they have a list of Series that matches their search or navigation criteria. This can be very simple to achieve, for example when the user uses the Dataset Navigation sidebar, they are essentially building a SDMX REST query by selecting which filters to apply to a Data Query. The outcome of the Navigation bar, is a single Dataflow is selected with zero or more Dimensions being filtered, each Dimension only has 1 filter applied.

Series List From Basic Query Filters

So for example for Exchange Rate data -> Argentine Pesto -> Annual Frequency, the URL query would be the following:

data/ECB,EXR,1.0/A.ARS....

However, as we only want to show the series information, we do not need every observations for each the series. We do want to display the last observation date and observation value, so we will ask only for the last observation to be included:

data/ECB,EXR,1.0/A.ARS....?lastNObservations=1

This is the simple case. What about the case where the query is far too large, Census hypercubes for example can contain millions of series. When the data is too large, a maximum series limit is requested:

data/ECB,EXR,1.0/all?lastNObservations=1&max=200  

This tells the server to execute the query by only output the first 200 series. There is no way to define how series are ordered, so the first 200 series does not guarentee in which sequence the series will come back.

Whilst the Fusion Data Browser does not include pagination, this could have been achieved using the offset parameter:

data/ECB,EXR,1.0/all?lastNObservations=1&max=200&offset=200

Series List From Search Result

The series list from a search result uses the same technique as the basic query filters, as the search result provides the Dataflow identifier, and if there is a 'hit' for a classification, then the Dimension and Code Id is also provided. This enables the query to be built using exactly the URL with Dataflow and query filter.

Series List From Series Basket

The Series Basket enables a user to hand pick one or more series, to save these as a basket which can be reloaded at a later date. Furthermore, the basket can contain series from different Dataflows.

Whilst some series baskets can be resolved using the REST URL to GET the data, this is not useful in all situations. Instead a JSON query can be POSTED to the web service. The POST contains the details about the Dataflow, Series List, and other parameters are included such as lastNObservations.

{
 "queries": [
    "dataflow": {
      "id": "GCI",
      "agencyId": "WB",
      "version": "1.0"
    },
    "lastNObservations": 1,
    "series": ["GHA:EOSQ071:RANK:A","GHA:EOSQ071:ONE_SEVEN_BEST:A","AUS:EOSQ071:RANK:A"]
  ]
}

As this syntax can contain multiple queries, it supports series baskets from mixed Dataflows.

SDMXUI TypeScript

The SeriesList component is used to build the series list. The dataset parameter can takes a JSON Data Query or REST URL (amongst other things)

let seriesListConfig: SeriesListOptions = {
   seriesTitle: "${CURRENCY} to ${CURRENCY_DENOM}",
   seriesSubtitle: "${EXR_TYPE}",
   seriesFooter: "Time Series Range:${T}  Frequency: ${FREQ}",
   limit: 200,
   id: "#series-list-div",
   height: 900,
   type: SeriesList,
   dataset: dataset,
   resolveLabel : key => { return LocaleUtil.getLabel(key);},
   onBuild: async function (widget: UiWidget): Promise<void> {
       //Series List has been built 'widget' is the built SeriesList
   },
   showPch:false,
   seriesSelect: (chart: ASeriesChart, series: Series) => {
       //User has Selected a series
   },
   seriesDeselect: (chart: ASeriesChart, series: Series) => {
       //User has deslected the series
   }
};
UiWidget.build(seriesListConfig);

Quick Filter

Web Service

Databrowser quick filter.png

The quick filter is a very easy problem to solve using the SDMX availableconstraint query. This query, which is also used for the Dataset Navigation solution, is used to request a list of Code Ids which remain valid based on the current query state. As we already have the supporting metadata, all we need are the code ids, not the code labels.

availableconstraint/ECB,EXR,1.0/A.ARS+CNY....?mode=available

There is one important thing to note with this query, and that is the mode=available. In the navigation menu this was set to ‘exact. Mode actual is requesting a list of code ids that would come back in the response if we were to execute the query now. Mode actual is requesting the code ids which remain valid selections even if they are not in the response.

The SDMX wiki provides full details on this important distinction. In this use case we want to present the user with a list of codes that remain available selections and disable to code ids that no longer remain valid based on the current query state.

The response to this query provides all the information required to update the quick filters accordingly.

SDMX UI TypeScript

The Series List Component provides the functions to ask what filters are available for each Dimension, so the Quick Filters are lazy loaded (filters are obtained when the user expands the drop down).

 let availableSelections:IAvailableData = await  seriesList.getUserFilteredAvailableData();
 let selectedCodes = await seriesList.userFilteredDataQuery.getCodeSelections(dimId);
 let validCodes = availableSelections.validcodes[dimId];

Advanced Filter

Description

Databrowser advancedfilter.png

The advanced filter is an extension of the quick filter, and makes use of the same availableconstraint API to get the information back. However, other then aspects which are solved in the user interface such as search and select all, there is one more feature which requires more information from the metadata. The extra feature is displaying the Codelist as a Hierarchy. This metadata already exists from the original query to build the Dataset Navigation menu, the information is in the Codelist response as each Code can reference a single parent by Id.

If we wanted to provide more complex hierarchices or different hierarchical views, we could have queried for Hierarchical Codelists which use the Codelist we are displaying.

On each code section, the information on number of series matches, and which codes remain available, is provided by the availableconstraint API.

SDMX UI Typescript

The CodePicker widget is used to build the advanced filter. The input is the current DataQuery as a Object, URL or simply just the Dataflow with no query state.

let config: CodePickerOptions = {
               id: "#codepicker",
               height: 950,
               input: query,
               type: CodePicker,
               displayOutputOptions: false,
               displayTimePicker: false,
               displayAggregationOption : false,
               dataCallback: (dataSet:DataSet) => void {
                   //do something 
               };
           };
let codePicker: CodePicker = UiWidget.build(config) as CodePicker;

To execute the query:

codePicker.executeQuery()

Series Details

Web Service

Databrowser serieslist item expanded.png

The Series Details simply shows the user all the metadata related to the series when expanded. This metadata includes the Dataflow name, Series short Code, Dimensions and Series Attributes with corresponding values, for example Frequency = Annual, Multiplier = Millions.

All of this metadata exists from the very first query we used to build the Dataset Navigation.

The additional query which is run when a series is expanded is for the full list of observations for the series. When the series list was first created we only queried for the last 2 observations, the reason we ask for 2 and not 1 is so we know if there are at least 2 observations for the series. If there are 2, then we know we can build a small Time Series chart, and if we can, we ask for the full series by using the standard REST query:

data/ECB,EXR,1.0/A.E5.PLN.ERD0.A

The query result contains all the observations for the series. If there is a danger that this may be too many, we can still use a restriction on observations by setting lastNObservations to a reasonably high number such as the last 1000.

SDMX UI TypeScript

The expand feature and sparkline time series chart is handelled by the Series List component which has already been covered. The Series List makes use of the SparkLine widget as follows:

let config:SeriesChartConfig = {
               id: "#" + chartUid,
               height: 85,
               dataset: `data/ECB,EXR,1.0/A.E5.PLN.ERD0.A`,
               type: SparkLine
           }
UiWidget.build(config);

Export Data

There is nothing special about the export Data, it simply uses the Data Query REST web service which has already been covered when building the series list, both the URL and JSON Post methods are used depending on the query. The only additional function is the use of the saveas query parameter.

data/ECB,EXR,1.0/A.E5.PLN.ERD0.A?saveas=myexport

The Accept-Header can be used to change the response format or the format parameter to achieve the same thing.

data/ECB,EXR,1.0/A.E5.PLN.ERD0.A?saveas=myexport&format=csv

The only additional feature is the use of Reporting Templates as an export format, this is where there is metadata to describe a custom layout for an Excel workbook. In order to provide a Reporting Template as an option, we need to know that a Reporting Template exists. In order to achieve this we can either issue a query for all Reporting Templates up front when we load the browser:

/reportingtemplate

Or query for all reporting templates for the selected Dataflow when the information is required

dataflow?references=reportingtemplate&format=fusion-json

Note: format=fusion-json has been explicitly set in this example as ReportingTemplates are only possible to output in this format.

User Interface Widgets

This completes the main API calls for the Fusion Data Browser, the remainder of this document is to do what how the information is used locally to visualise and manipulate the data.

Pivot Table

Cross Sectional Charts

Time Series Charts

Embed URL