Publication Table

From Fusion Registry Wiki
Revision as of 10:05, 29 March 2022 by Mnelson (talk | contribs) (Dimension Variables)
Jump to navigation Jump to search

Publication Tables

Overview

Using SDMX concepts of Agency, Id, Name, Description
Table Layout and Content Design

A Publication Table is a curated table of data, where the values in the table rows and columns can be individually specified. This enables tables of data to pull data from multiple sources and piece the data together in a way which is not possible when dealing with standard cubes of data, whose table model is typically based on a Pivot model, with columns and rows built around the Dimension values to which the column or row is assigned. A Publication Table breaks out of this restrictive model by allowing rows and columns of data to reference observations which do not need to share any relationship number of Dimensions or reported Values of the Dimensions.

Data in a Publication Table may be sourced from a single Dataflow or it may be sourced from multiple Dataflows enabling cross comparison of data.

Publication Tables can be static (every cell is a defined Observation) or dynamic (a cell defines the Observation but includes variables values for one or more Dimensions, which can be provided at table build time.

Time arithmetic and frequency conversion is supported, enabling a single table to pull data from different time periods and frequencies which are all derived from the base period on which the table is built.

Publication Tables supports simple calculations, where each calculation can take input from one or more Observations over one or more Dataflows. Variable values also play a big role in Publication Tables, and includes standard variable placeholders – as well as advanced conditional variables, whose list of variable values change based on the value of another Dimension for which the table is built.

Metadata vs Materialised

A Publication Table is stored in Fusion Registry the same way any other piece of structural metadata is stored. In this way it has an owner (Agency), and identity (ID) and a version. The change history is tracked using the same services, and the table definition can be queried using the same SDMX web services for Structural Metadata.

The Publication Table metadata contains information about which Dataflows are used by the Table, which additional variables are being imported, and a definition of the table structure, and which Observations to import by referencing them by their unique Observation Key.

A Publication Table definition can be materialised into a table of Data by from the Publication Table Web Service, which combines the information of the Table Identity, and Variable Values to rewrite the Observation Key into actual observation values. If building a web User Interface, then response from this service can be a packet of JSON, which describes the layout and content of the final Table in a way which is quick and simple to render for the User to see. The process of materialising the table is known as ‘build time’.

Additional web services are provided to enable clients to ask questions about the table such as ‘what are the earliest and latest time periods for which there is data for this table’ or ‘what possible variable values are there for this table’. This enables user interfaces to provide a user with the option to change the variables ‘on the fly’ ensuing they are able to build a table which is known to have data.

Publication Table Cell Content

Overview

Publication Table Layout

A Publication Table is made up of a Table Head Rows and Table Body Rows.

The Table Head is made up of 1 or more table rows. Each Table Head cell may span 1 or more columns and 1 or more rows. The Table Head content is made up of free text and / or variable placeholders whose values are resolved when the table is built. Table Head Cells can be left empty.

The Table Body is made up of 1 or more Table Rows. The Table Rows provide the Publication Table with its data, by referencing Observations from the Dataflow.

At table build time all variables are resolved, and the corresponding data are retrieved from the data stores which hold the data.


Table Rows

Overview

A Table Row consist of a Table Row heading, this can be a mix of free text and variable text. The Table Row Heading is always the first Table Row Cell, and can not span multiple columns or rows. A Table Row can not have more then one heading. The remaining Table Row Cells are the Observation Cells, which may contain free text and variable text but will typically contain a reference to an Observation by an Obsevation Keys or a Calculation which can combine multiple Obsevation Keys.

A Table Row which results in no Observation data will not be included in materialised table at build time, unless the Table Row is a grouping row (see Row Levels for details on grouping rows).

Row Levels

Showing Levels in the Table Rows

The Table Row Heading can be used to define a row level. The level indicates to the client responsible for rendering the table the row's position in a hierarchy. Row levels are depicted using one or more leading hyphens ‘-‘ before the text. The number of hyphens indicates the level, with one leading hypen indicating level 1, two indicating level 2, and so on. It is possible to skip levels if required, i.e. Row 1 is level 1 with Row 2 hainv level 3, skipping out level 2 entirely.

An example of row headings using levels is as follows:

Countries
 - Europe
 -- France
 -- Germany
 - Asia
 -- China

Levels can be an important way to group data. A Table Row which acts as a parent row in a hierarchy, can be used as grouping mechanism for the rows that follow. This enables Parent Rows to be included in the output table, even if the Table Row contains no Observation data. The default behaviour for a Table Row with no data is to exclude it from the built table.

Observation Key

Overview

The purpose of the Observation Key is to reference a Observation whose value will be resolved at table build time. The Observation Key can explicitly resolve to 1 observation value, if all parts of the Key are provided. However, it is possible to make parts of the key varaibles, allowing the values to be set at table build time.

An Observation Key is made up of 4 parts:

  • Dataflow Reference
  • Series Reference
  • Time Period Reference
  • Measure Reference

An example Observation Key with all of these parameters in place is:

DF_REG.M:UK:SUR:2008:BIRTHS

The key is broken down into the following parts:

Observation Key Part Name Description Required
DF_REG Dataflow Alias This identifies the Dataflow which has the data. The identification uses the alias, which is assiged to the Dataflow in the Publication Table definition. The Dataflow Alias is only required if the Publication Table references more then one Dataflow. If the Publication Table references more then one Dataflow, it can assign a Dataflow to a Default Alias, which means that it is required as part of the Observation Key. No (if only 1 Dataflow or a default alias is set)
M:UK:SUR Series Key This identifies the Series of Data for the Dataflow, by providing a single value for each of the Dimension Members of the Dataflow, separated by a ‘:’ character. In this example the Dataflow has 3 Dimensions (excluding time), the value for the first is 'M', the second 'UK' and the last 'SUR'. As the Dataflow in this example has a Time Dimension this Series Key will uniquely identify a Series of Observations where each Observation refers to a unique Time Period Yes (but can omit values)
2008 Time Period This identifies the Time Period in the Series of Data identified by the previous section. The Combination of the Series and Time Period for a Time Series Dataflow will uniquely identify the Observation No. Omit to make it variable for Time Series Data (defaults to last period). Not required if there is no Time Dimension for the DSD.
BIRTHS Measure Dimension If the Dataflow uses a Data Structure with multiple Measures, then the Measure ID must be included as the last part of the Observation Key. In this example the Dataflow has multiple measures (BIRTHS/DEATHS/MARRIAGES) so the Observation Key needs to include this value. When there is only 1 measure it is not required, when there are multiple measures, it is required.

Variable Dimension Members

It is possible to omit Dimension Values from the Observation Key, when this is the case that Dimension will become a variable – whose value can be set at table build time. For example, the second Dimension value of ‘UK’, from the previous example, can be omitted like this:

M::SUR:2008

The Publication Table can now be built for any Country by passing in the variable at table build time - i.e. UK, FR, DE. By passing in different variable values, the Observation Key will change, and as such it will reference a different Series in the Dataset.

When an Observation Key has a variable Dimension member, the ID of the Dimension can be referenced in text cells to create variable headings. For example, if the ID of the 2nd Dimension in the DSD is REF_AREA, the Table Column could be given a title of $(REF_AREA) which would resolve to 'United Kingdom', 'France', or 'Germany' at build time depending on which varaible value is used to build the table. #See Dimension Variables.

Variable Time Period

For Time Series dataset, the Time Dimension can form part of the Observation Key, for example M:UK:SUR:2008. The 2008 part of the key refers to the year 2008.

It is possible to omit the value for Time Period in the Observation Key. Omitting Time Period will make the Time value a variable which can be provided at table build time, defaulting to the last period for which data exists if no value is supplied.

Extending the above example to omit the Time value results in an Observation Key whose Country and Time values can be provided to the web service at table build time.

M::SUR

Note: The above key is exactly the same as using the variable Time Period value 'P' in the Time Dimension, as follows:

M::SUR:P

Additional functions can be performed on the Time Dimension, such as moving forwards or backwards from the table base period, #see Time Variables for more details.

Free Text Observation Keys

Whilst typically a cell in the body of the Publication Table would resolve to an Observation value, it is possible to use it for free text. To include free text cells, start the cell value with a leading apostrophe character, for example:

'Cell Text

Note: An Observation Key can not contain a mix of free text as well as an Observation value

Calculated Observation Cells

Publication Table with Calucated Cells and Variable Dimensions

An Observation Cell may contain calculated values, which can consist of a mix of simple algebra (+, -, *, /) and a reference to one or more Observations which may come from multiple Dataflows.

To define a Calculated Observation the Cell the following rules apply:

  • The Observation Key must start with :=
  • Any Observation Keys must appear within enclosing sequare brackets '[' and ']'
  • An Observation key may embed calculations on a single Dimension by using parenthesis '(' and ')' in the key part
  • The same rules for variable Dimensions and Time Period apply when resolving the Obseration Keys used in the calculation

Examples are shown in the table below

Example Syntax Description
:=[A:UK:EMP]+10 Add 10 to the observation value retrieved from the key A:UK:EMP
:=[A:(UK+FR+DE):EMP] Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP
:=[A:UK:EMP]+[A:FR:EMP]+[A:DE:EMP] Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP
:=[A:(UK+FR):EMP]/[A:(UK+FR):UMP] Adds the series A:UK:EMP + A:FR:EMP and divides by the result of A:UK:UMP + A:FR:UMP
:=((100/[A:UK:EMP:(M+F):P-1])*[A:UK:EMP:(M+F)])-100 Demonstrates the ability to use parenthesis outside of the Observation Keys to control the order of mathematical operations
:=[POP.M:UK:(M+F)] / [LAB.M:UK:EMP] Demonstrates the ability to calculate data from two Dataflows (defined by alias POP and LAB)
:=[M:UK:EMP:M(P)] + [A:UK:EMP:A(P)] Demonstrates the ability to calculate data where the period is converted from the base period

Footnotes

TODO

Table Variables

Overview

Variables are a very important part of Publication Tables. They can be used to make the content of the table dynamic, setting and changing the value at run time to change the table content. Variables can be used in Observation Keys, but the values of variables can also be used anywhere where there is text, for example the table heading, subheading, column or row headings, footnotes, and Observation Cells with free text.

This section describes the various variables that exist, along with the syntax of how to use them.

Dataflow Variables

Publication Tables define data cells across one or more Dataflows. Each ‘imported’ Dataflow is given an alias, which can be used to refer to the Dataflow in the Observation Key.

It is possible to refer to the Dataflow's Name, Description, and Id using the following variable syntax.

Dataflow Description (dataflow alias=EXR)

$(EXR).desc 

Dataflow Name (dataflow alias=EXR)

$(EXR).name 

Dataflow ID (dataflow alias=EXR)

$(EXR).id

Dimension Variables

When an Observation Key has variable Dimension Values, the table will be built by replacing this variable with a value to complete the Observation Key. This value can be passed explicitly to the table (for example ‘build me a table for UK’) or if not defined, it will be defaulted. If the Dimension uses a Codelist or Valuelist, then the value of the variable referrers to the Code/Value Identifier. The name of the Code can be resolved at runtime by using the following syntax $([dimension id]), for example:

$(INDICATOR)  

This would resolve to the name of the Indicator value used in the table, for example Employment

To Output the name of the Concept i.e. the label 'Indicator', then the following syntax can be used $([component id].name), for example:

$(INDICATOR.name)

Component Values

If a fixed Dimension value is used in the table, or if there is a Series/Observation Attribute whose value is required in the table, then these can be explicitly referenced using the following syntax $([component id].[component value]), for example:

$(OBS_STATUS.M)

This would resolve to the name of the OBS_STATUS Attribute, when the value is M, for example Missing

The Component Name can be referenced using the following syntax $([component id].name), for example:

$(OBS_STATUS.name)

Coded Variables

It is possible to import additional Codelists and/or Valuelists into the Publication Table, for the purpose of using the Names or enumerated items as variables. When a Codelist/Valuelist is imported, it is given an alias, and the combination of the Alias and Item ID resolves to the Name of the Item. The syntax is $([codelist alias].[item id]) for example:

$(CURRENCY.$)

Resolving to the text Dollar in English and דוֹלָר if the Publication Table were built in Hebrew (and a Hebrew name for the corresponding Value List item).

Conditional Variables

A Conditional Variable is one whose value is conditional on the value of a Dimension Variable.

An example use case is Banknotes, where the values for NOTE1, NOTE2, or NOTE3 is dependant on the Country Dimension. A Conditional Variable could be used to set a values for each variable (NOTE1, NOTE2, NOTE3) based on the value used for the current Country Dimension i.e. US NOTE1=$5, UK NOTE1=£5.

The syntax of referring to a Conditional Variable is $([conditional variable id][variable id]), for example

$(BANKNOTE[NOTE1])

Time Variables

When the data is time series (i.e. the DSD has a Time Dimension), the Observation Key can include the time period, for example:

A:UK:EMP:2008

Where 2008 is the Observation Time Period to use. The format of the Time Period must always be the same format as the frequency for which the data are reported. For example if the data is annual, and an observation is reported for 2008, then the observation key must use 2008, not another frequency such as 2008-12-31. It is typical in a DSD for the Frequency of the Series to be the first Dimension, in all the examples on this page this convention is applied, and as such the above key A:UK:EMP refers to Annual data, depicted by the initial dimension value 'A'.

If the key omits the time period, then the time period becomes a variable which can be set at table build time, if not set it defaults to the last period for which there is data.

A:UK:EMP

When a Publication Table has a variable time period, the resolved variable time value for this varaible is known as the base period. There can only be 1 base period for a Publication Table, however other Observation Keys (and corresponding headings) can perform time arithmetic from the base period for which the table is built.

The supported time functions include walking forwards and backwards in increments of the given frequency, converting to another frequency of data, and a combination of the two (convert and move, move and convert, move convert and move again). Ths supported functions are shown in the table below.

Syntax Description Example Observation Key Example Resolved Key
P The base period (default if not provided) A:UK:EMP:P A:UK:EMP:2008 (base period=2008)
P-n Base Period minus 'n' Periods where n is a positive Integer A:UK:EMP:P-1 A:UK:EMP:2007 (base period=2008)
P+n Base Period plus 'n' Periods where n is a positive Integer A:UK:EMP:P+1 A:UK:EMP:2008-Q2 (base period=2008-Q1)
A(P) Base Period converted to Annual A:UK:EMP:A(P) A:UK:EMP:2008 (base period=2008-Q2)
S(P) Base Period converted to Bi-Annual (Semester) S:UK:EMP:S(P) S:UK:EMP:2008-S2 (base period=2008-Q3)
T(P) Base Period converted to Tri-Annual T:UK:EMP:T(P) T:UK:EMP:2008-T3 (base period=2008)
Q(P) Base Period converted to Quarterly Q:UK:EMP:Q(P) Q:UK:EMP:2008-Q4 (base period=2008)
M(P) Base Period converted to Monthly M:UK:EMP:M(P) M:UK:EMP:2008-03 (base period=2008-Q1)
W(P) Base Period converted to Weekly W:UK:EMP:W(P) W:UK:EMP:2008-04 (base period=2008-01-01)
D(P) Base Period converted to Daily D:UK:EMP:D(P) D:UK:EMP:2008-12-31 (base period=2008)
H(P) Base Period converted to Hourly H:UK:EMP:D(P) H:UK:EMP:2008-12-31T12 (base period=2008-02-02T12:00:03)
A(P-n) Base Period minus (or plus) 'n' periods, converted to another frequency 'A' A:UK:EMP:A(P-4) A:UK:EMP:2007 (base period=2008-Q4)
A(P)-n Base Period, converted to another frequency 'A' minus (or plus) n' periods A:UK:EMP:A(P)-4 A:UK:EMP:2004 (base period=2008-Q4)
A(P-n1)-n2 Base Period, minus (or plus) 'n1' periods, converted to another frequency 'A' minus (or plus) 'n2' periods A:UK:EMP:A(P-4)-4 A:UK:EMP:2003 (base period=2008-Q4)

Notes on Time Period
When converting from a lower frequency to a higher frequency (Weekly to Daily for example), the cast date always resolves to the end of the Period.

It is also important to note that as the Observation Keys in these examples all have Frequency as the first Dimension, when converting to another frequency (Quarterly to Annual for example) the Observation Key changes the value for the Frequency Dimension, ensuring the correct series of data is retrieved.

Time Period variable in free text
When refering to the Time Period as a variable in free text, for example in the table heading or a column heading, the variable is placed inside the variable syntax $(). For example $(P) would resolve to base period, and $(A(P)) would resolve to 'base period converted to annual frequency'. An example usage is shown below:

Table 1A showing data for $(P) to $(P+3)

Variables Summary

Variable Type Description Syntax Example Example Output
Dataflow Name Displays the name of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built $([dataflow alias]) $(EDU) World Bank, Education
Dataflow Description Displays the description of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built $([dataflow alias].desc) $(EDU.desc) World Bank, Education
Dataflow Id Displays the ID of a Dataflow referenced by the Publication Table $([dataflow alias].id) $(EDU.id) DF_EDU
Variable Dimension Value Displays the name of the variable used when an Observation key has a wildcard member $([dimension id]) $(REF_AREA) United Kingdom
Specific Component Value Displays the name of the Component Value $([component id].[component value]) $(OBS_STATUS.C) Confidential
Component Name Displays the name of the DSD Component (this is the name of the Concept used by the Component) in the locale in which the Publication Table is built $([component id].name) $(OBS_STATUS) Confidentiality Status
Enumeration Value Displays the name of a Code in a Codelist or Value in a Valuelist $([list alias].[list item ID] $(SEX.M) Male
Conditional Variable Displays the variable value, which is conditional on the value of a variable Dimension $([conditional variable id][variable id]) $(BANKNOTES[NOTE1]) £5
Time Displays the base time period for which the table is built, or an offset from this time period / cast to a different frequency see Time Functions $(P) 2008