Difference between revisions of "Publication Table"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Overview)
(Overview)
Line 1: Line 1:
 
= Publication Tables =
 
= Publication Tables =
 
== Overview ==
 
== Overview ==
A Publication Table is a curated table of data, where the values in the table rows and columns can be individually specified.  This enables tables of data to pull data from multiple sources and piece the data together in a way which is not possible when dealing with standard cubes of data, whose table model is typically based on a Pivot model with columns and rows based on the Dimension.  A Publication Table breaks this model by allowing rows and columns of data to reference observations which do not need to share any relationship in the Dimensionality or the values of the Dimensions.  Data may come from a single [[Dataflow_-_Structural_Metadata_Management|Dataflow]] or it may be sourced from multiple Dataflows, enabling cross comparison of data.   
+
A Publication Table is a curated table of data, where the values in the table rows and columns can be individually specified.  This enables tables of data to pull data from multiple sources and piece the data together in a way which is not possible when dealing with standard cubes of data, whose table model is typically based on a Pivot model, with columns and rows built around the Dimension values to which the column or row is assigned.  A Publication Table breaks this model by allowing rows and columns of data to reference observations which do not need to share any relationship in the Dimensionality or the values of the Dimensions.  Data may come from a single [[Dataflow_-_Structural_Metadata_Management|Dataflow]] or it may be sourced from multiple Dataflows, enabling cross comparison of data.   
  
 
Publication Tables can be static (every cell is a defined Observation) or dynamic (a cell defines the Observation but includes variables values for one or more Dimensions, which can be provided at table build time.  
 
Publication Tables can be static (every cell is a defined Observation) or dynamic (a cell defines the Observation but includes variables values for one or more Dimensions, which can be provided at table build time.  

Revision as of 09:16, 29 March 2022

Publication Tables

Overview

A Publication Table is a curated table of data, where the values in the table rows and columns can be individually specified. This enables tables of data to pull data from multiple sources and piece the data together in a way which is not possible when dealing with standard cubes of data, whose table model is typically based on a Pivot model, with columns and rows built around the Dimension values to which the column or row is assigned. A Publication Table breaks this model by allowing rows and columns of data to reference observations which do not need to share any relationship in the Dimensionality or the values of the Dimensions. Data may come from a single Dataflow or it may be sourced from multiple Dataflows, enabling cross comparison of data.

Publication Tables can be static (every cell is a defined Observation) or dynamic (a cell defines the Observation but includes variables values for one or more Dimensions, which can be provided at table build time.

Time arithmetic and frequency conversion is supported, enabling a single table to pull data from different time periods and frequencies which are all derived from the base period on which the table is built.

Publication Tables supports simple calculations, where each calculation can take input from one or more Observations over one or more Dataflows. Variable values also play a big role in Publication Tables, and includes standard variable placeholders – as well as advanced conditional variables, whose list of variable values change based on the value of another Dimension for which the table is built.

A Publication Table is stored in Fusion Registry the same way any other piece

The following image shows a very simple Publication Table structure, where the table headings contain static text, and each cell describes an observation to show via the series key. Other parts of this table definition describe the table title, subtitle, and what the source Dataflow is.

Publication Table Definition.png
Publication Table.png

Metadata vs Materialised

A Publication Table is stored in Fusion Registry the same way any other piece of structural metadata is stored. In this way it has an owner (Agency), and identity (ID) and a version. The change history is tracked using the same services, and the table definition can be queried using the same SDMX web services for Structural Metadata.

The Publication Table metadata contains information about which Dataflows are used by the Table, which additional variables are being imported, and a definition of the table structure, and which Observations to import by referencing them by their unique key.

A Publication Table definition can be materialised by requesting the table from the publication table web service. The web service can be given variable values to use when building the table e.g. build Table 1A for Country UK and time period 2008. This is known as ‘table build time’, any missing variables which are required at build time are defaulted by the system.

The built table is simply a JSON representation of a table of data, enabling the table to be rendered easily by the User Interface in a fully customisable way. Additional web services are provided to enable clients to ask questions about the table such as ‘what are the earliest and latest time periods for which there is data for this table’ or ‘what possible variable values are there for this table’. This enables user interfaces to provide a user with the option to change the variables ‘on the fly’ ensuing they are able to build a table which is known to have data.

Table Content

Overview

A Publication Table is made up of a Table Head and Body Rows. Each Body Row has first cell which is the Row Title. The Table Head is made up of 1 or more rows, and each head cell may span 1 or more columns, and 1 or more rows. The Table Head content is made up of free text and / or variable placeholders whose values are resolved when the table is built.

A Body Row is made up of 1 or more rows. The Row Title has a fixed row span and column span of 1. The content is made up of free text and / or variable placeholders whose values are resolved when the table is built. The remainder of the Body Row is made is up Observation cells. An Observation Cell can reference a unique Observation by an Observation Key, the value for this Observation is resolved at table build time. The Observation Key may contain variable placeholders and can include calculation expressions. An Observation Cell may also contain free text.

Table Rows

Overview

A Table Row consist of a row heading, this can be a mix of free text and variable text. A Table Rowcan contain 0 to many Obsevation Keys which will resolve to the corresponding value at table build time. A Table Row which results in no observation data will not be included in the built table, unless it is a grouping row (see Row Levels below).

Row Levels

Showing Levels in the Table Rows

The Table Heading can also be used to define a row level, where the level would indicate to the client responsible for rendering the table, the row's position in the hierarchy. Row levels are depicted using one or more leading hyphens ‘-‘ before the text. The number of hyphens indicates the level, and it is possible to skip levels if required (i.e. level 1 straight to level 3). For example the row headings for the first 6 rows may look like the following:

Countries
 - Europe
 -- France
 -- Germany
 - Asia
 -- China

Levels can be an important way to group data, and allows some rows to be used as simply a grouping mechanism for the rows that follow (with no data on the grouping row itself).

If a row is used as a grouping mechanism for subsequent rows, and those grouped rows have no data at table build time, then the entire group will be omitted from the built table. Using the above example of the Countries hierarchy, if Asia is a text row to group China, and there is no data for the entire row of China – then the row ‘China’ and the enclosing group ‘Asia’ will not be output in the built table.

Observation Key

Overview

The puspose of the Observation Key is to reference a Observation whose value will be resolved at table build time. The Observation Key can explicitly resolve to 1 observation value, if all parts of the Key are provided. However, it is possible to make parts of the key varaibles, allowing the values to be set at table build time.

An Observation Key is made up of 4 parts:

  • Dataflow Reference
  • Series Reference
  • Time Period Reference
  • Measure Reference

An example Observation Key with all of these parameters in place is:

DF_REG.M:UK:SUR:2008:BIRTHS

The key is broken down into the following parts:

Observation Key Part Name Description Required
DF_REG Dataflow Alias This identifies the Dataflow which has the data. The identification uses the alias, which is assiged to the Dataflow in the Publication Table definition. The Dataflow Alias is only required if the Publication Table references more then one Dataflow. If the Publication Table references more then one Dataflow, it can assign a Dataflow to a Default Alias, which means that it is required as part of the Observation Key. No (if only 1 Dataflow or a default alias is set)
M:UK:SUR Series Key This identifies the Series of Data for the Dataflow, by providing a single value for each of the Dimension Members of the Dataflow, separated by a ‘:’ character. In this example the Dataflow has 3 Dimensions (excluding time), the value for the first is 'M', the second 'UK' and the last 'SUR'. As the Dataflow in this example has a Time Dimension this Series Key will uniquely identify a Series of Observations where each Observation refers to a unique Time Period Yes (but can omit values)
2008 Time Period This identifies the Time Period in the Series of Data identified by the previous section. The Combination of the Series and Time Period for a Time Series Dataflow will uniquely identify the Observation No. Omit to make it variable for Time Series Data (defaults to last period). Not required if there is no Time Dimension for the DSD.
BIRTHS Measure Dimension If the Dataflow uses a Data Structure with multiple Measures, then the Measure ID must be included as the last part of the Observation Key. In this example the Dataflow has multiple measures (BIRTHS/DEATHS/MARRIAGES) so the Observation Key needs to include this value. When there is only 1 measure it is not required, when there are multiple measures, it is required.

Taking into account the rules for when a Observation Key part is required, a simple Observation Key could be simply M:UK:SUR, or even smaller if dimensions are to become variables - for example M::SUR

Variable Dimension Members

It is possible to omit Dimension Values from the Observation Key, when this is the case that Dimension will become a variable – whose value can be set at table build time. For example, the second Dimension value of ‘UK’, from the previous example, can be omitted like this:

M::SUR:2008

The Publication Table can now be built for any Country by passing in the variable at table build time - i.e. UK, FR, DE. By passing in different variable values, the Observation Key will change, and as such it will reference a different Series in the Dataset.

When an Observation Key has a variable Dimension member, the ID of the Dimension can be referenced in text cells to create variable headings. For example, if the ID of the 2nd Dimension in the DSD is REF_AREA, the Table Column could be given a title of $(REF_AREA) which would resolve to 'United Kingdom', 'France', or 'Germany' at build time depending on which varaible value is used to build the table. #See Dimension Variables.

Variable Time Period

For Time Series dataset, the Time Dimension can form part of the Observation Key, for example M:UK:SUR:2008. The 2008 part of the key refers to the year 2008.

It is possible to omit the value for Time Period in the Observation Key. Omittig Time Period will make the Time value a variable which can be provided at table build time, defaulting to the last period for which data exists if no value is supplied.

Extending the above example to omit the Time value results in an Observation Key whose Country and Time values can set defined at table build time.

M::SUR

Note: The above key is exactly the same as using the variable Time Period value 'P' in the Time Dimension, as follows:

M::SUR:P

Additional functions can be performed on the Time Dimension, such as moving forwards or backwards from the table base period, #see Time Functions for more details.

Free Text Observation Keys

Whilst typically a cell in the body of the Publication Table would resolve to an Observation value, it is possible to use it for free text. To include free text cells, start the cell value with a leading apostrophe character, for example:

'Cell Text

Note: An Observation Key can not contain a mix of free text as well as an Observation value

Calculated Observation Cells

An Observation Cell may contain calculated values, which can consist of a mix of simple algebra (+, -, *, /) and a reference to one or more Observations which may come from multiple Dataflows.

To define a Calculated Observation the Cell the following rules apply:

  • The Observation Key must start with :=
  • Any Observation Keys must appear within enclosing sequare brackets '[' and ']'
  • An Observation key may embed calculations on a single Dimension by using parenthesis '(' and ')' in the key part
  • The same rules for variable Dimensions and Time Period apply when resolving the Obseration Keys used in the calculation

Examples are shown in the table below

Example Syntax Description
:=[A:UK:EMP]+10 Add 10 to the observation value retrieved from the key A:UK:EMP
:=[A:(UK+FR+DE):EMP] Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP
:=[A:UK:EMP]+[A:FR:EMP]+[A:DE:EMP] Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP
:=[A:(UK+FR):EMP]/[A:(UK+FR):UMP] Adds the series A:UK:EMP + A:FR:EMP and divides by the result of A:UK:UMP + A:FR:UMP
:=((100/[A:UK:EMP:(M+F):P-1])*[A:UK:EMP:(M+F)])-100 Demonstrates the ability to use parenthesis outside of the Observation Keys to control the order of mathematical operations
:=[POP.M:UK:(M+F)] / [LAB.M:UK:EMP] Demonstrates the ability to calculate data from two Dataflows (defined by alias POP and LAB)
:=[M:UK:EMP:M(P)] + [A:UK:EMP:A(P)] Demonstrates the ability to calculate data where the period is converted from the base period

Footnotes

Table Variables

Overview

Variables are a very important part of Publication Tables. They can be used to make the content of the table dynamic, setting and changing the value at run time to change the table content. Variables can be used in Observation Keys, but the values of variables can also be used anywhere where there is text, for example the table heading, subheading, column or row headings, or even free text in an Observation Key.

This section describes the various variables that exist, along with the syntax of how to use them.

Dataflow Variables

Publication Tables define data cells across one or more Dataflows. Each ‘imported’ Dataflow is given an alias, which is used to refer to the Dataflow in the table headings, and table body. It is possible to refer to the Dataflow name, description, and Id using the following variable syntax. The following examples are based on a Dataflow which has been given the alias of 'EXR'

Dataflow Description

$(EXR).desc 

Dataflow Name

$(EXR).name 

Dataflow ID

$(EXR).id

Dimension Variables

When an Observation Key has variable Dimension Values, the table will be built by replacing this variable with a value to complete the Observation Key. This value can be passed explicitly to the table (for example ‘build me a table for UK’) or if not defined, it will be defaulted. If the Dimension uses a Codelist or Valuelist, then the value of the variable referrers to the Code/Value Identifier. The name of the Code can be resolved at runtime by using the following syntax $([dimension id]), for example:

$(INDICATOR)  

This would resolve to the name of the Indicator value used in the table, for example Employment

To Output the name of the Concept i.e. the label 'Indicator', then the following syntax can be used $([component id].name), for example:

$(INDICATOR.name)  

Component Values

If a fixed Dimension value is used in the table, or if there is a Series/Observation Attribute whose value is required in the table, then these can be explicitly referenced using the following syntax $([component id].[component value]), for example:

$(OBS_STATUS.M)

This would resolve to the name of the OBS_STATUS Attribute, when the value is M, for example Missing

The Component Name can be referenced using the following syntax $([component id].name), for example:

$(OBS_STATUS.name)

Coded Variables

It is possible to import additional Codelists and/or Valuelists into the Publication Table, for the purpose of using the Names or enumerated items as variables. When a Codelist/Valuelist is imported, it is given an alias, and the combination of the Alias and Item ID resolves to the Name of the Item. The syntax is $([codelist alias].[item id]) for example:

$(CURRENCY.$)

Resolving to the text Dollar in English and דוֹלָר if the Publication Table were built in Hebrew (and a Hebrew name for the corresponding Value List item).

Conditional Variables

A Conditional Variable is one whose value is conditional on the value of a Dimension Variable.

An example use case is Banknotes, where the values for NOTE1, NOTE2, or NOTE3 is dependant on the Country Dimension. A Conditional Variable could be used to set a values for each variable (NOTE1, NOTE2, NOTE3) based on the value used for the current Country Dimension i.e. US NOTE1=$5, UK NOTE1=£5.

The syntax of referring to a Conditional Variable is $([conditional variable id][variable id]), for example

$(BANKNOTE[NOTE1])

Time Variables

When the data is time series (i.e. the DSD has a Time Dimension), the Observation Key can include the time period, for example:

A:UK:EMP:2008

Where 2008 is the Observation Time Period to use. The format of the Time Period must always be the same format as the frequency for which the data are reported. For example if the data is annual, and an observation is reported for 2008, then the observation key must use 2008, not another frequency such as 2008-12-31. It is typical in a DSD for the Frequency of the Series to be the first Dimension, in all the examples on this page this convention is applied, and as such the above key A:UK:EMP refers to Annual data, depicted by the initial dimension value 'A'.

If the key omits the time period, then the time period becomes a variable which can be set at table build time, if not set it defaults to the last period for which there is data.

A:UK:EMP

When a Publication Table has a variable time period, the resolved variable time value for this varaible is known as the base period. There can only be 1 base period for a Publication Table, however other Observation Keys (and corresponding headings) can perform time arithmetic from the base period for which the table is built.

The supported time functions include walking forwards and backwards in increments of the given frequency, converting to another frequency of data, and a combination of the two (convert and move, move and convert, move convert and move again). Ths supported functions are shown in the table below.

Syntax Description Example Observation Key Example Resolved Key
P The base period (default if not provided) A:UK:EMP:P A:UK:EMP:2008 (base period=2008)
P-n Base Period minus 'n' Periods where n is a positive Integer A:UK:EMP:P-1 A:UK:EMP:2007 (base period=2008)
P+n Base Period plus 'n' Periods where n is a positive Integer A:UK:EMP:P+1 A:UK:EMP:2008-Q2 (base period=2008-Q1)
A(P) Base Period converted to Annual A:UK:EMP:A(P) A:UK:EMP:2008 (base period=2008-Q2)
S(P) Base Period converted to Bi-Annual (Semester) S:UK:EMP:S(P) S:UK:EMP:2008-S2 (base period=2008-Q3)
T(P) Base Period converted to Tri-Annual T:UK:EMP:T(P) T:UK:EMP:2008-T3 (base period=2008)
Q(P) Base Period converted to Quarterly Q:UK:EMP:Q(P) Q:UK:EMP:2008-Q4 (base period=2008)
M(P) Base Period converted to Monthly M:UK:EMP:M(P) M:UK:EMP:2008-03 (base period=2008-Q1)
W(P) Base Period converted to Weekly W:UK:EMP:W(P) W:UK:EMP:2008-04 (base period=2008-01-01)
D(P) Base Period converted to Daily D:UK:EMP:D(P) D:UK:EMP:2008-12-31 (base period=2008)
H(P) Base Period converted to Hourly H:UK:EMP:D(P) H:UK:EMP:2008-12-31T12 (base period=2008-02-02T12:00:03)
A(P-n) Base Period minus (or plus) 'n' periods, converted to another frequency 'A' A:UK:EMP:A(P-4) A:UK:EMP:2007 (base period=2008-Q4)
A(P)-n Base Period, converted to another frequency 'A' minus (or plus) n' periods A:UK:EMP:A(P)-4 A:UK:EMP:2004 (base period=2008-Q4)
A(P-n1)-n2 Base Period, minus (or plus) 'n1' periods, converted to another frequency 'A' minus (or plus) 'n2' periods A:UK:EMP:A(P-4)-4 A:UK:EMP:2003 (base period=2008-Q4)

Notes on Time Period
When converting from a lower frequency to a higher frequency (Weekly to Daily for example), the cast date always resolves to the end of the Period.

It is also important to note that as the Observation Keys in these examples all have Frequency as the first Dimension, when converting to another frequency (Quarterly to Annual for example) the Observation Key changes the value for the Frequency Dimension, ensuring the correct series of data is retrieved.

Time Period variable in free text
When refering to the Time Period as a variable in free text, for example in the table heading or a column heading, the variable is placed inside the variable syntax $(). For example $(P) would resolve to base period, and $(A(P)) would resolve to 'base period converted to annual frequency'. An example usage is shown below:

Table 1A showing data for $(P) to $(P+3)

Variables Summary

Variable Type Description Syntax Example Example Output
Dataflow Name Displays the name of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built $([dataflow alias]) $(EDU) World Bank, Education
Dataflow Description Displays the description of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built $([dataflow alias].desc) $(EDU.desc) World Bank, Education
Dataflow Id Displays the ID of a Dataflow referenced by the Publication Table $([dataflow alias].id) $(EDU.id) DF_EDU
Variable Dimension Value Displays the name of the variable used when an Observation key has a wildcard member $([dimension id]) $(REF_AREA) United Kingdom
Specific Component Value Displays the name of the Component Value $([component id].[component value]) $(OBS_STATUS.C) Confidential
Component Name Displays the name of the DSD Component (this is the name of the Concept used by the Component) in the locale in which the Publication Table is built $([component id].name) $(OBS_STATUS) Confidentiality Status
Enumeration Value Displays the name of a Code in a Codelist or Value in a Valuelist $([list alias].[list item ID] $(SEX.M) Male
Conditional Variable Displays the variable value, which is conditional on the value of a variable Dimension $([conditional variable id][variable id]) $(BANKNOTES[NOTE1]) £5
Time Displays the base time period for which the table is built, or an offset from this time period / cast to a different frequency see Time Functions $(P) 2008