Difference between revisions of "Publication Table"
(→Formatting Numerical Observation Values) |
(→Formatting Numerical Observation Values) |
||
Line 464: | Line 464: | ||
Numerical observation values can be formatted in the following way | Numerical observation values can be formatted in the following way | ||
− | # Number of Decimal Places or Significant Figures can be defined | + | # '''Number of Decimal Places''' or '''Significant Figures''' can be defined |
# For Decimal places this can be set to a maximum rule or a fixed rule, i.e the value 1.3 to 2 decimal places is 1.3 using the maximum rule and 1.30 using the fixed rule | # For Decimal places this can be set to a maximum rule or a fixed rule, i.e the value 1.3 to 2 decimal places is 1.3 using the maximum rule and 1.30 using the fixed rule | ||
− | # Rounding mode can be specified | + | # '''Rounding mode''' can be specified |
− | # Scientific notation can be requested on output, i.e 123456 to 2 significant figures is 120000 in standard form or 1.2E4 in scientific form | + | # '''Scientific notation''' can be requested on output, i.e 123456 to 2 significant figures is 120000 in standard form or 1.2E4 in scientific form |
− | # Number formatting rules can be applied specific to a single Language (i.e. French Formatting) or Dynamic formatting, which uses the Accept-Langauge locale/language sent be the web browser (this can be overridden using the locale=[loc] query parameter. The formatting rules influence which characters to use as the thousands and decimals separator, and for some languages it will change the numerical characters that are shown | + | # '''Number formatting''' rules can be applied specific to a single Language (i.e. French Formatting) or Dynamic formatting, which uses the Accept-Langauge locale/language sent be the web browser (this can be overridden using the locale=[loc] query parameter. The formatting rules influence which characters to use as the thousands and decimals separator, and for some languages it will change the numerical characters that are shown |
# Rules can be applied to all series, or specific series | # Rules can be applied to all series, or specific series | ||
Revision as of 03:49, 7 November 2022
@See Publication Table Web Services
Contents
Publication Tables
Introduced to Fusion Registry Enterprise v11.0.0.17
Overview
A Publication Table is a curated table of data, where the values in the table rows, columns and table body cells can be individually specified. This enables a single table to pull data from multiple sources and piece the data together in a way which is not possible when dealing with standard cubes of data, whose table model is typically based on a Pivot model, with columns and rows built around the Dimension values to which the column or row is assigned. A Publication Table breaks out of this restrictive model by allowing cells of data to reference observations which do not need to share any relationship, in terms of dimensionality or even data source, with adjacent cells,.
Data in a Publication Table may be sourced from a single Dataflow or it may be sourced from multiple Dataflows enabling cross comparison of data.
Publication Tables can be static (every cell is a defined Observation) or dynamic, where a cell includes variables Dimensions in the Observation Key, which can be provided at table build time.
Time arithmetic and frequency conversion is supported, enabling a single table to pull data from different time periods and frequencies which are all derived from the base period on which the table is built.
Publication Tables support calculations, where each calculation can take input from one or more Observations over one or more Dataflows.
Variable values also play a big role in Publication Tables, and includes standard variable placeholders – as well as advanced conditional variables, whose list of variable values change based on the value of another Dimension for which the table is built.
Metadata vs Materialised
A Publication Table is stored in Fusion Registry the same way any other piece of structural metadata is stored. In this way it has an owner (Agency), and identity (ID) and a version. The change history is tracked using the same services, and the table definition can be queried using the same SDMX web services for Structural Metadata.
The Publication Table metadata contains information about which Dataflows are used by the Table, which additional variables are being imported, and a definition of the table structure, and which Observations to import by referencing them by their unique Observation Key.
A Publication Table definition can be materialised into a table of Data by from the Publication Table Web Service, which combines the information of the Table Identity, and Variable Values to rewrite the Observation Key into actual observation values. If building a web User Interface, then response from this service can be a packet of JSON, which describes the layout and content of the final Table in a way which is quick and simple to render for the User to see. The process of materialising the table is known as ‘build time’.
Additional web services are provided to enable clients to ask questions about the table such as ‘what are the earliest and latest time periods for which there is data for this table’ or ‘what possible variable values are there for this table’. This enables user interfaces to provide a user with the option to change the variables ‘on the fly’ ensuing they are able to build a table which is known to have data.
Publication Table Cell Content
Overview
A Publication Table is made up of a Table Head Rows and Table Body Rows.
The Table Head is made up of 1 or more table rows. Each Table Head cell may span 1 or more columns and 1 or more rows. The Table Head content is made up of free text and / or variable placeholders whose values are resolved when the table is built. Table Head Cells can be left empty.
The Table Body is made up of 1 or more Table Rows. The Table Rows provide the Publication Table with its data, by referencing Observations from the Dataflow.
At table build time all variables are resolved, and the corresponding data are retrieved from the data stores which hold the data.
Table Rows
Overview
A Table Row consist of a Table Row heading, this can be a mix of free text and variable text. The Table Row Heading is always the first Table Row Cell, and can not span multiple columns or rows. A Table Row can not have more then one heading. The remaining Table Row Cells are the Observation Cells, which may contain free text and variable text but will typically contain a reference to an Observation by an Obsevation Keys or a Calculation which can combine multiple Obsevation Keys.
A Table Row which results in no Observation data will not be included in materialised table at build time, unless the Table Row is a grouping row (see Row Levels for details on grouping rows).
Row Levels
The Table Row Heading can be used to define a row level. The level indicates to the client responsible for rendering the table the row's position in a hierarchy. Row levels are depicted using one or more leading chevrons ‘>‘ before the text. The number of hyphens indicates the level, with one leading hypen indicating level 1, two indicating level 2, and so on. It is possible to skip levels if required, i.e. Row 1 is level 1 with Row 2 assigned to level 3, skipping out level 2 entirely.
An example of row headings using levels is as follows:
Countries > Europe >> France >> Germany > Asia >> China
Levels can be an important way to group data. A Table Row which acts as a parent row in a hierarchy, can be used as grouping mechanism for the rows that follow. This enables Parent Rows to be included in the output table, even if the Table Row contains no Observation data. The default behaviour for a Table Row with no data is to exclude it from the built table.
Looping Rows
Instead of explicitly defining a row for each Dimension member, it is possible to define a row as a loop (to iterate over all, or a subset of Dimension members). For example, if a publication table contains a Dimension for countries with an ID of REF_AREA then a loop can be set up to loop through all REF_AREA values, or a subset of the values - for which there is data. If the values are backed by a Codelist which has a hierarchy, the hierarchy can be preserved in the final table. It is also possible to nest a loop within another loop, for example for each REF_AREA output each INDICATOR.
Syntax
The loop is defined by giving the table row a label which starts with a syntax of :=LOOP([args]), like in the following example
:=LOOP([dimension id], [preserve hierarchy], [loop order], [explicit values to include])
Argument Definition dimension id = Required. The ID of the Dimension to loop over. For example CURRENCY would loop over all the Currency values which have data in the given context. The row label is the Currency, for example USD. preserve hierarchy = Optional, default false. If true the hierarchy of the Codes in the underlying Codelist will be built into the row hierarchy loop order = Optional, default 1. 1=Natural Order (as they appear in the Codelist), 2=By Code ID, 3=By Code Label explicit values to include = Optional, default include all values. A var arg list of values to loop over.
Full Example
:=LOOP(REF_AREA, true, 3, EUR,FR,DE,UK,ES)
Simple Loop
The loop is defined by giving the table row a label which starts with a syntax of :=LOOP([args]), like in the following example
:=LOOP(REF_AREA)
This loop definition will loop through all the REF_AREA values, building a row for each value. The Observation keys for the row can refer to the loop variable using a $(DIM_ID) placeholder in the Observation Key, for example:
Employed Student :=LOOP(REF_AREA) A:$(REF_AREA):EMP A:$(REF_AREA):STU
The output table would look like the following:
Employed Student United Kingdom 20 12 France 22 11 Germany 25 15
Nested Loop
If a loop row is followed by a row (or rows) which is an extra level deep, the child row(s) will be included for each loop, for example
Employed Student > :=LOOP(REF_AREA) >> Total A:$(REF_AREA):T:EMP A:$(REF_AREA):T:STU >>> Male A:$(REF_AREA):M:EMP A:$(REF_AREA):M:STU >>> Female A:$(REF_AREA):F:EMP A:$(REF_AREA):F:STU
Because the Total row is a child row of the LOOP (denoted by the number of leading '>') and Male/Female are children of Total - for each REF_AREA there will be a Total, Male and Female, outputting a table like the following:
Employed Student United Kingdom Total Male 5 1 Female 4 2 France Total Male 6 2 Female 7 3
It is also possible to define a subloop as a loop. The previous example can be rewritten as:
Employed Student > :=LOOP(REF_AREA) >> :=LOOP(SEX) A:$(REF_AREA):$(SEX):EMP A:$(REF_AREA):$(SEX):STU
There is no limit to the number of subloops that can be defined.
Loop Hierarchical Values
If the underlying Codelist has a hierarchy, a subsequent argument can be provided to ask for the hierarchy to be preserved
Countries > :=LOOP(REF_AREA, true)
Note: A hierarchy will keep rows with no data if they are acting as a parent node for rows that do have data (e.g. there may not be data for Europe, but the row exists because there are child rows of France and Germany with data).
Nested loops can also preserve hierarchies for example:
:=LOOP(REF_AREA, true) >:=LOOP(SEX, true)
With an example output (assuming no data exists for the hierarical parents (Continent, Europe)
Continent Europe France Total (M+F) Male Female Germany Total (M+F) Male Female
Order Loop Rows
It is possible to order the labels by their natural order (that of the Codelist), by id or by label, using the numbers 1, 2, or 3 respectively. The following outputs the rows ordered by label.
Countries > :=LOOP(REF_AREA, true, 3)
Fixed Loop List
It is also possible to define a subset of the values to loop over. For example, if REF_AREA has data for 100 countries, but only 3 are required in the table. In this scenario, the natural order is that of the provided list, not of the underlying Codelist.
Countries > :=LOOP(REF_AREA, true, 3, UK, FR, DE)
Loop over External Variable
Instead of looping the values of the Dimension's Codelist, it is possible to loop another Codelist - the purpose for this is to support alternative labels (if the labels for the publication table need to be different from those of the Dimension members) - or a subset (as an alternative to passing the subset of values into the LOOP argument), or to define a hierarchy (if the Dimension's Codelist does not have a hierarchy, or an alternative is required).
In order to loop over an external Variable, the Publication Table must be linked to an external Codelist/Valuelist against an Alias. For example EXT=My Country Codelist with Hierarchy. The corresponding loop then loops the Codes in the external list, and stores these the corresponding Dimension ID variable, for example:
Countries > REF_AREA:=LOOP(EXT)
Noting that the loop is over the variable EXT which resolves to an external Codelist. The loop is assigned to the REF_AREA Dimension.
Anchor Rows
The default behaviour is for a row to be omitted if the row has no data and it has no child rows with data. However, if the desired result is to always output the row, then this can be achieved by using the [anchor] tag in the row header label.
Consider the following Publication Table definition:
Male Female Employed A:M:EMP A:F:EMP Student A:M:STU A:F:STU Retired A:M:RET A:F:RET
If there are no data for Student, then the materialised table would look like this
Male Female Employed 12 13 Retired 3 5
Anchor Single Row [anchor]
To anchor a single row, add the [anchor] tag at the start of the row
Male Female Employed A:M:EMP A:F:EMP [anchor]Student A:M:STU A:F:STU Retired A:M:RET A:F:RET
Then the output table would preserve the 'Student' row
Male Female Employed 12 13 Student Retired 3 5
This rule can be coupled with formatting for missing observations, then the table could be enhanced to look something like this
Male Female Employed 12 13 Student - - Retired 3 5
Anchor Child Rows [anchor+]
For a row with a hierarchy, an [anchor+] tag can be placed on a parent node to anchor the parent and all of its children, for example:
Continent [anchor+]>Europe >>France >>>Total (M+F) >>>>Male >>>>Female >>Germany >>>Total (M+F) >>>>Male >>>>Female
The above rule would always output Europe and all of its descendants (France, Total, Male, Female, Germany).
Anchor Siblings and Child Rows [anchor*]
The [anchor*] tag can be used as a way to anchor all following siblings and child nodes, this can be a useful mechanism to anchor all rows for the whole table by placing the anchor[*] on the first row of the table.
Anchor Loops
If a loop is preceeded by an [anchor] tag then the system will have the following behaviour:
- If an external Codelist variable is being looped, all codes will be ouput regardless of what data is stored in the system
- If a Dimension is being looped, all codes in the Codelist used by the Dimension will be output regardless of what data is stored in the system
- If a Dimension is being looped and it is not coded (i.e it is free text or numerical or other) then the [anchor] has no effect, as the looped values are based on what data exists
Observation Key
Overview
The purpose of the Observation Key is to reference a Observation whose value will be resolved at table build time. The Observation Key can explicitly resolve to 1 observation value, if all parts of the Key are provided. However, it is possible to make parts of the key varaibles, allowing the values to be set at table build time.
An Observation Key is made up of 4 parts:
- Dataflow Reference
- Series Reference
- Time Period Reference
- Measure Reference
An example Observation Key with all of these parameters in place is:
DF_REG.M:UK:SUR:2008:BIRTHS
The key is broken down into the following parts:
Observation Key Part | Name | Description | Required |
---|---|---|---|
DF_REG | Dataflow Alias | This identifies the Dataflow which has the data. The identification uses the alias, which is assiged to the Dataflow in the Publication Table definition. The Dataflow Alias is only required if the Publication Table references more then one Dataflow. If the Publication Table references more then one Dataflow, it can assign a Dataflow to a Default Alias, which means that it is required as part of the Observation Key. | No (if only 1 Dataflow or a default alias is set) |
M:UK:SUR | Series Key | This identifies the Series of Data for the Dataflow, by providing a single value for each of the Dimension Members of the Dataflow, separated by a ‘:’ character. In this example the Dataflow has 3 Dimensions (excluding time), the value for the first is 'M', the second 'UK' and the last 'SUR'. As the Dataflow in this example has a Time Dimension this Series Key will uniquely identify a Series of Observations where each Observation refers to a unique Time Period | Yes (but can omit values) |
2008 | Time Period | This identifies the Time Period in the Series of Data identified by the previous section. The Combination of the Series and Time Period for a Time Series Dataflow will uniquely identify the Observation | No. Omit to make it variable for Time Series Data (defaults to last period). Not required if there is no Time Dimension for the DSD. |
BIRTHS | Measure Dimension | If the Dataflow uses a Data Structure with multiple Measures, then the Measure ID must be included as the last part of the Observation Key. In this example the Dataflow has multiple measures (BIRTHS/DEATHS/MARRIAGES) so the Observation Key needs to include this value. | When there is only 1 measure it is not required, when there are multiple measures, it is required. |
Variable Dimension Members
It is possible to omit Dimension Values from the Observation Key, when this is the case that Dimension will become a variable – whose value can be set at table build time. For example, the second Dimension value of ‘UK’, from the previous example, can be omitted like this:
M::SUR:2008
The Publication Table can now be built for any Country by passing in the variable at table build time - i.e. UK, FR, DE. By passing in different variable values, the Observation Key will change, and as such it will reference a different Series in the Dataset.
When an Observation Key has a variable Dimension member, the ID of the Dimension can be referenced in text cells to create variable headings. For example, if the ID of the 2nd Dimension in the DSD is REF_AREA, the Table Column could be given a title of $(REF_AREA) which would resolve to 'United Kingdom', 'France', or 'Germany' at build time depending on which varaible value is used to build the table. #See Dimension Variables.
Variable Time Period
For Time Series dataset, the Time Dimension can form part of the Observation Key, for example M:UK:SUR:2008. The 2008 part of the key refers to the year 2008.
It is possible to omit the value for Time Period in the Observation Key. Omitting Time Period will make the Time value a variable which can be provided at table build time, defaulting to the last period for which data exists if no value is supplied.
Extending the above example to omit the Time value results in an Observation Key whose Country and Time values can be provided to the web service at table build time.
M::SUR
Note: The above key is exactly the same as using the variable Time Period value 'P' in the Time Dimension, as follows:
M::SUR:P
Additional functions can be performed on the Time Dimension, such as moving forwards or backwards from the table base period, #see Time Variables for more details.
Free Text Observation Keys
Whilst typically a cell in the body of the Publication Table would resolve to an Observation value, it is possible to use it for free text. To include free text cells, start the cell value with a leading apostrophe character, for example:
'Cell Text
Note: An Observation Key can not contain a mix of free text as well as an Observation value
Calculated Observation Cells
An Observation Cell may contain calculated values, which can consist of a mix of simple algebra (+, -, *, /) and a reference to one or more Observations which may come from multiple Dataflows.
To define a Calculated Observation the Cell the following rules apply:
- The Observation Key must start with :=
- Any Observation Keys must appear within enclosing sequare brackets '[' and ']'
- An Observation key may embed calculations on a single Dimension by using parenthesis '(' and ')' in the key part
- The same rules for variable Dimensions and Time Period apply when resolving the Obseration Keys used in the calculation
Examples are shown in the table below
Example Syntax | Description |
---|---|
:=[A:UK:EMP]+10 | Add 10 to the observation value retrieved from the key A:UK:EMP |
:=[A:(UK+FR+DE):EMP] | Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP |
:=[A:UK:EMP]+[A:FR:EMP]+[A:DE:EMP] | Add the series A:UK:EMP + A:FR:EMP + A:DE:EMP |
:=[A:(UK+FR):EMP]/[A:(UK+FR):UMP] | Adds the series A:UK:EMP + A:FR:EMP and divides by the result of A:UK:UMP + A:FR:UMP |
:=((100/[A:UK:EMP:(M+F):P-1])*[A:UK:EMP:(M+F)])-100 | Demonstrates the ability to use parenthesis outside of the Observation Keys to control the order of mathematical operations |
:=[POP.M:UK:(M+F)] / [LAB.M:UK:EMP] | Demonstrates the ability to calculate data from two Dataflows (defined by alias POP and LAB) |
:=[M:UK:EMP:M(P)] + [A:UK:EMP:A(P)] | Demonstrates the ability to calculate data where the period is converted from the base period |
Percentage Of
The PCTOF (Percentage Of) function can be used to calculate 'x' as a percentage of 'y' following the syntax:
PCTOF(x,y)
This is reolved to the following mathmatical expression: (100/y) * x
x and y must either be a numerical value, and observation key, or an expression using a combination of the two. For example
:=PCTOF([A:M:UK:P],[A:TOT:UK:P])
The Observation Key follows the same standard rules with respect to varaibles and inline calculations - the following example shows an Observation Key where Male + Female values are added with the result being used in the Percentage Of function
:=PCTOF([A:M:UK],[A:(M+F):UK])
A more complex example shows calculations around the PCTOF function:
:=(10 + PCTOF([A:M:UK:A(P)-1],[A:(M+F):UK:A(P)])) / 2
Footnotes
Overview
Footnotes can be added to any part of the publication table (table column/row header, or observation cells). The output table is generated with a footnote number, with the list of footnotes at the end of the table.
Footnotes need to be thought about in 2 parts: Part 1 are the list of footnotes which are defined at the Publication Table level. Part 2 are the references to the footnotes from within the table. This separation of allows footnotes to be reused within the same table, and when working with placeholders footnote templates can be set up where the text changes based on the placeholder values defined in the table builder.
To add a footnote, add it to the Publication Table, for example:
Footnote 1
Footnote 2
Then in the table builder, add a reference to which footnote to use, $(1) references Footnote 1, and $(2) references Footnote 2.
Note: The order of the list of footnotes in the table definition may not be reflected in the order they are output. As such a footnote referenced by $(1) may be output as the second footnote.
Footnotes with Variable
A footnote can reference variables, using the variable syntax,
A footnote for $(REF_AREA.UK)
A footnote can even output an observation value, observation attribute or series attribute.
Measure Value
A footnote with obs value $(A:UK:EMP:2008:OBS_VALUE)
Observation Attribute
A footnote with obs attribute $(A:UK:EMP:2008:OBS_COMMENT)
Series Attribute
A footnote with obs attribute $(A:UK:EMP:P:SERIES_TITLE)
When combining variable footnotes with placeholder values and loops, it is possible to dynamically create references to observation attributes.
Note: the use of P in the final cell key which is a reference to the time period for which the table is being built.
Footnotes with Placeholders
It is possible to write a generic footnote for the publication table, which includes placeholders. When the footnote is referenced, the placeholder values are provided. The syntax for a placeholder is {n} where n is a positive integer, for example {1}.
Footnote Example
A footnote for $(REF_AREA.{1}) and $(REF_AREA.{2})
In the table builder, the footnote is referenced as usual, but with arguments for the placeholders:
Footnote reference in Table builder
My Table Row $(1,UK,FR)
The footnote reference in the above example is in bold, and consists of which footnote text to use (in this case the first one in the list, denoted by '1'), and 2 arguments are passed for the placeholders, placeholder 1 takes the value of UK and placeholder 2 takes the value of FR. During the materialise process the intermediate text is generated:
A footnote for $(REF_AREA.UK) and $(REF_AREA.FR)
Which then goes through the variable resolve process, in this case both variables, $(REF_AREA.UK) and $(REF_AREA.FR), match the Coded Variable syntax and would resolve to:
Final Output
A footnote for United Kingdom and France
Footnotes in Observation Cell
Any number of footnotes can appear in an Observation cell, simply put the footnote(s) at the end of the Observation Key.
Standard Footnote Reference
DF_REG.M:UK:SUR:2008:BIRTHS $(1) $(2)
Footnote with Placeholder
DF_REG.M:UK:SUR:2008:BIRTHS $(1,UK)
Footnotes in Loops
A row loop contains a locally scoped variable, and as such it can be used when calling a footnote with a placeholder.
Example Footnotes
$(A:T:{1}:EMP:P:SERIES_TITLE) $(A:T:{1}:EMP:P:TITLE_COMPL) $(A:{1}:{2}:EMP:P:OBS_COMMENT)
Example Table Definition
Employed Student > :=LOOP(REF_AREA) $(1,$(REF_AREA[id])) >> Total $(2,$(REF_AREA[id])) A:$(REF_AREA):T:EMP A:$(REF_AREA):T:STU >> :=LOOP(SEX) $(3,$(REF_AREA[id]),$(SEX[id])) A:$(REF_AREA):$(SEX):EMP A:$(REF_AREA):$(SEX):STU
Footnote Intermediate Resolution
Row 1 (REF_AREA=UK)
$(A:T:UK:EMP:P:SERIES_TITLE)
Row 2 (REF_AREA=UK)
$(A:T:UK:EMP:P:TITLE_COMPL)
Row 3 (REF_AREA=UK, SEX=M)
$(A:UK:M:EMP:P:OBS_COMMENT)
Final Footnote output
Row 1 (REF_AREA=UK)
This is a series for the United Kingdom
Row 2 (REF_AREA=UK)
This is a title compliment for the United Kingdom
Row 3 (REF_AREA=UK, SEX=M)
This is an observation comment for Males in the United Kingdom
Formatting Cells
Formatting Numerical Observation Values
Numerical observation values can be formatted in the following way
- Number of Decimal Places or Significant Figures can be defined
- For Decimal places this can be set to a maximum rule or a fixed rule, i.e the value 1.3 to 2 decimal places is 1.3 using the maximum rule and 1.30 using the fixed rule
- Rounding mode can be specified
- Scientific notation can be requested on output, i.e 123456 to 2 significant figures is 120000 in standard form or 1.2E4 in scientific form
- Number formatting rules can be applied specific to a single Language (i.e. French Formatting) or Dynamic formatting, which uses the Accept-Langauge locale/language sent be the web browser (this can be overridden using the locale=[loc] query parameter. The formatting rules influence which characters to use as the thousands and decimals separator, and for some languages it will change the numerical characters that are shown
- Rules can be applied to all series, or specific series
Match Series
When creating a formatting rule, it is possible to define which series the rule applies to. If left blank, the rule will apply to all observation values.
The syntax for defining specific series is shown in the following table:
Rule Type | Syntax | Description |
---|---|---|
Series Match | =A:UK:EMP | Match a specific series from the default Dataflow |
Series Match Multiple Values | =A:UK+FR+DE:EMP | Multiple codes in 2nd Dimension (UK, FR and DE) any can match for the rule to be valid |
Series Match by Dataflow Alias | =FLOW2.A:UK:EMP | Match a specific series for the Dataflow with alias FLOW2 |
Wildcard Series Match | =A::EMP | Match all series with the given dimension values (2nd Dimension is wildcarded) |
Not Rule | !=A:UK:EMP | Match all series which do not match this series (wildcard series also supported) |
Dimension Rule | CURRENCY=GBP | Match all series which have CURRENCY of GBP |
Not Dimension Rule | !CURRENCY=GBP | Match all series do not have a CURRENCY of GBP |
Multiple Rules | CURRENCY=GBP,FREQ=M | Comma is used to include multiple matches (also applies to series match) |
Formatting Missing Observations
Observation Values can be missing for 2 reasons:
- There is no Observation in the database for the given Observation Key
- There is an Observation in the database but the measure value is either: NaN, null, or an empty string
In both cases it is possible to define an output value using formatting rules. For the first case, the observation does not exist, it is possible to define a default output value, for example ‘-‘. For the second case, where the observation does exist, it is possible to define a set of rules based on another Component of the Observation, for example the Observation output may be related to the OBS_STATUS Attribute.
When the formatted value is based on a Component, a set of rules must be defined which state ‘when component=x output y’, where x may be an fixed value or a pattern to match (using a regular expression). The rules for mapping component values are defined in a Representation Map, and the Publication Table simply provides the connection to the Representation Map to use. This enables reuse of the mapping rules across multiple Publication Tables. The Publication Table must define two properties in order to support Component Mapping rules:
- The Component to obtain the value from i.e OBS_STATUS
- A link to the Representation Map, which holds the mapping rules. Note, the Representation Map is expected to have 1 source mapping, this could be of any type (string, codelist, other)
An example Reprsentation Map for OBS_STATUS is shown below:
OBS_STATUS | Is Reg Ex | Missing Value | Description |
---|---|---|---|
M | false | mv | When OBS_STATUS is M output mv |
L | false | np | When OBS_STATUS is L output np |
.* | true | - | When OBS_STATUS is anything else output - |
Frequency Formatting
Time Variables always resolve themselves to an SDMX Date Format when forming the Observation Key, this ensures the correct observation value is returned from the data store. When a time variable is used as a label, in a table heading for example, the same formatting will be applied by default but a custom format can be provided. Custom display formats for a Time Period enable a time period such as 2007-Q1 to be resolved to custom text string, for example '31 Mar 2007'.
Custom Time formats rules are defined by providing an association between an SDMX Frequency ID (#see SDMX Date Format) and the output format template. The output format template is a string that defines which components of time are output an in which order and style. For example to output a date as 'March 2001' the format template would look like this 'MMMM yyyy' to indicate a full month name (MMMM) and a 4 character year (yyyy). format rules make use of the Simple Date Format which is used by Java as such all of the documentation on Simple Date Format applies.
Frequency Format Mapping is not defined as part of a Publication Table, instead the rules are defined in a Representation Map and the Publication Table references the Representation Map as the source of its frequency conversion rules. This enables the same set of rules to be used for multiple Publication Tables. The Representation Map must map 1 source (Codelist or String) to 1 target (String), it can contain as many mapped frequencies as required. If the Representation Map contains a mapping for a value that is not a valid SDMX Frequency, then it will be ignored.
It is important to note that all time periods resolve to a single point in time, which is the end of the period, for example an Annual period of 2008 resolves to 2008-12-31:T23:59:59. The resolution of frequency to a point in time enables the frequency formatting rules to use all aspects of time, even when converting a low frequency, for example 2008 formatted using the pattern dd MMM yy would resolve to '31 Dec 2008'.
Variables Summary
Variable Type | Description | Syntax | Example | Example Output |
---|---|---|---|---|
Dataflow Name | Displays the name of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built | $([dataflow alias]) | $(EDU) | World Bank, Education |
Dataflow Description | Displays the description of a Dataflow referenced by the Publication Table in the locale in which the Publication Table is built | $([dataflow alias].desc) | $(EDU.desc) | World Bank, Education |
Dataflow Id | Displays the ID of a Dataflow referenced by the Publication Table | $([dataflow alias].id) | $(EDU.id) | DF_EDU |
Variable Dimension Value | Displays the name of the variable used when an Observation key has a wildcard member | $([dimension id]) | $(REF_AREA) | United Kingdom |
Specific Component Value | Displays the name of the Component Value | $([component id].[component value]) | $(OBS_STATUS.C) | Confidential |
Component Name | Displays the name of the DSD Component (this is the name of the Concept used by the Component) in the locale in which the Publication Table is built | $([component id].name) | $(OBS_STATUS) | Confidentiality Status |
Enumeration Value | Displays the name of a Code in a Codelist or Value in a Valuelist | $([list alias].[list item ID] | $(SEX.M) | Male |
Conditional Variable | Displays the variable value, which is conditional on the value of a variable Dimension | $([conditional variable id][variable id]) | $(BANKNOTES[NOTE1]) | £5 |
Time | Displays the base time period for which the table is built, or an offset from this time period / cast to a different frequency | see Time Functions | $(P) | 2008 |
Table Variables
Overview
Variables are a very important part of Publication Tables. They can be used to make the content of the table dynamic, setting and changing the value at run time to change the table content. Variables can be used in Observation Keys, but the values of variables can also be used anywhere where there is text, for example the table heading, subheading, column or row headings, footnotes, and Observation Cells with free text.
The general syntax for a variable is
$(variable id.sub part[detail])
Where variable id is always required, the subpart and detail are optional. Subpart is used if the variable resolves to a Component and a coded value is required, the detail array part obtains the id, name, or description of the underlying SDMX type.
This section describes the various variables that exist, along with the syntax of how to use them.
Dataflow Variables
Publication Tables define data cells across one or more Dataflows. Each ‘imported’ Dataflow is given an alias, which can be used to refer to the Dataflow in the Observation Key.
It is possible to refer to the Dataflow's Name, Description, and Id using the following variable syntax.
Default Dataflow
Dataflow Id
$([id])
Dataflow Name
$() or $([name))
Dataflow Description
$([desc])
Dataflow with Alias=EXR
Dataflow Id
$(EXR[id])
Dataflow Name
$(EXR) or $(EXR[name))
Dataflow Description
$(EXR[desc])
Dimension Variables
When an Observation Key has variable Dimension Values, the table will be built by replacing this variable with a value to complete the Observation Key. The variable value is either passed explicitly to the web service, or defaulted by the system if not provided. The Dimension Variable can be referenced in the table using the syntax $([dimension id]), for example:
Dimension Value Name
$(INDICATOR) resolves to 'Employment'
This would resolve to the name of the Indicator value used in the table, for example EMP may resolve to the label Employment. The label for the Dimension Value is obtained from the Codelist or Valuelist which the Dimension uses. If the Dimension does not use a list (i.e. it is free text or numerical) then the Dimension value is simply used in place of the variable placeholder.
To resolve to the id of the value for the Component, for example EMP, use the following syntax
Dimension Value Id
$(INDICATOR[id]) resolves to 'EMP'
Dependent Dimension Variable
It is possible that the value for one Dimension is dependent on the value of another Dimension, for example CURRENCY may be dependent on the COUNTRY and COUNTRY is a variable in the Observation Key - it is possible to specify that the value to use for the CURRENCY Dimension can be derived using a lookup table. The lookup table in this instance is maintained as an SDMX Representation Map which is used to map COUNTRY to CURRENCY. The Publication Table specifies that the two Dimensions are related and provides a reference to the Representation Map to use which has the rules.
When working with a dependent variable, both the dependent variable and the Dimension on which it depends should be omitted from the Observation Key - for example consider a full key of:
A:UK:GBP:2007
Where UK is the COUNTRY and GBP is the Currency, to make the COUNTRY a variable Dimension, it should be omitted from the key, as follows:
A::GBP:2007
To then make the CURRENCY a variable it is also omitted , as follows:
A:::2007
The Publication Table is given the rule that CURRENCY depends on COUNTRY and should use the COUNTRY_2_CURRENCY Representation Map, which in this example looks like the following:
COUNTRY | Is Reg Ex | CURRENCY |
---|---|---|
UK | false | GBP |
US | false | USD |
.* | true | EUR |
When the Publication Table is materialised the web service only requires the COUNTRY information, and it will complete the Observation key using the representation map lookup table.
Component Values
Components of the underlying Data Structure can be explicitly referenced using the following syntax $([component id]), for example:
Component Name
$(OBS_STATUS) or $(OBS_STATUS[name]) resolves to 'Observation Status'
Component Description
$(OBS_STATUS[desc]) resolves to 'Information on the quality of a value or an unusual or missing value.'
Coded Variables
To reference a Code value, the syntax is $([component id / codelist alias].[item id]). The first variable part is the ID of a Component in the underlying Data Structure, or alternativly the alias of an external variable defined on the Publication Table. An external variable is a link to an SDMX Codelists or Valuelist. Example are:
Name of Code
$(CURRENCY.$) resolving to the text 'Dollar
Description of Code
$(DWELLING.SM[desc])
Conditional Variables
A Conditional Variable is one whose value is conditional on the value of a Dimension Variable.
An example use case is Banknotes, where the values for NOTE1, NOTE2, or NOTE3 is dependant on the Country Dimension. A Conditional Variable could be used to set a values for each variable (NOTE1, NOTE2, NOTE3) based on the value used for the current Country Dimension i.e. US NOTE1=$5, UK NOTE1=£5.
The syntax of referring to a Conditional Variable is $([conditional variable id][variable id]), for example
$(BANKNOTE[NOTE1])
Reported Attribute for an Observation or Series
A variable can be a reference to a part of an Observation, for example the Unit Multiplier (UNIT_MULT). A Table Row header, for example, could include variable text such as 'Number of payment accounts (thousands)' where the word thousands is a variable because it depends on other variables which build the key.
To achieve this, use the syntax:
$(A:UK::2008:UNIT_MULT[name])
The observation key in this case is A:UK::2008, where the third Dimension is a variable as it has no value (in the main body of the table this dimension would probably also be a variable, which means the table can be built with different values, each value may change the output of this UNIT_MULT value). The last part of the key is the Attribute ID (UNIT_MULT) - this can be a reference to a series attribute, or an observation attribute. The [name] indicates that it should resolve to the name of the reported attribute value, alternative values are id for the reported value or desc for the description of the reported value.
If time is also a variable, then the key can change accordingly, using the variable syntax for time (P):
$(A:UK::P:UNIT_MULT[name])
Time Variables
When the data is time series (i.e. the DSD has a Time Dimension), the Observation Key can include the time period, for example:
A:UK:EMP:2008
Where 2008 is the Observation Time Period to use. The format of the Time Period must always be the same format as the frequency for which the data are reported. For example if the data is annual, and an observation is reported for 2008, then the observation key must use 2008, not another frequency such as 2008-12-31. It is typical in a DSD for the Frequency of the Series to be the first Dimension, in all the examples on this page this convention is applied, and as such the above key A:UK:EMP refers to Annual data, depicted by the initial dimension value 'A'.
If the key omits the time period, then the time period becomes a variable which can be set at table build time, if not set it defaults to the last period for which there is data.
A:UK:EMP
When a Publication Table has a variable time period, the resolved variable time value for this varaible is known as the base period. There can only be 1 base period for a Publication Table, however other Observation Keys (and corresponding headings) can perform time arithmetic from the base period for which the table is built.
The supported time functions include walking forwards and backwards in increments of the given frequency, converting to another frequency of data, and a combination of the two (convert and move, move and convert, move convert and move again). Ths supported functions are shown in the table below.
Syntax | Description | Example Observation Key | Example Resolved Key |
---|---|---|---|
P | The base period (default if not provided) | A:UK:EMP:P | A:UK:EMP:2008 (base period=2008) |
P-n | Base Period minus 'n' Periods where n is a positive Integer | A:UK:EMP:P-1 | A:UK:EMP:2007 (base period=2008) |
P+n | Base Period plus 'n' Periods where n is a positive Integer | A:UK:EMP:P+1 | A:UK:EMP:2008-Q2 (base period=2008-Q1) |
A(P) | Base Period converted to Annual | A:UK:EMP:A(P) | A:UK:EMP:2008 (base period=2008-Q2) |
S(P) | Base Period converted to Bi-Annual (Semester) | S:UK:EMP:S(P) | S:UK:EMP:2008-S2 (base period=2008-Q3) |
T(P) | Base Period converted to Tri-Annual | T:UK:EMP:T(P) | T:UK:EMP:2008-T3 (base period=2008) |
Q(P) | Base Period converted to Quarterly | Q:UK:EMP:Q(P) | Q:UK:EMP:2008-Q4 (base period=2008) |
M(P) | Base Period converted to Monthly | M:UK:EMP:M(P) | M:UK:EMP:2008-03 (base period=2008-Q1) |
W(P) | Base Period converted to Weekly | W:UK:EMP:W(P) | W:UK:EMP:2008-04 (base period=2008-01-01) |
D(P) | Base Period converted to Daily | D:UK:EMP:D(P) | D:UK:EMP:2008-12-31 (base period=2008) |
H(P) | Base Period converted to Hourly | H:UK:EMP:D(P) | H:UK:EMP:2008-12-31T12 (base period=2008-02-02T12:00:03) |
A(P-n) | Base Period minus (or plus) 'n' periods, converted to another frequency 'A' | A:UK:EMP:A(P-4) | A:UK:EMP:2007 (base period=2008-Q4) |
A(P)-n | Base Period, converted to another frequency 'A' minus (or plus) n' periods | A:UK:EMP:A(P)-4 | A:UK:EMP:2004 (base period=2008-Q4) |
A(P-n1)-n2 | Base Period, minus (or plus) 'n1' periods, converted to another frequency 'A' minus (or plus) 'n2' periods | A:UK:EMP:A(P-4)-4 | A:UK:EMP:2003 (base period=2008-Q4) |
FA(P) | Subtracts 1 Period from the Converted Period IF the resolved Convert From end of period does not match the same point in time as the Convert To period.
Examples:
This works for ALL frequencies where the convert to frequency is lower then the base Period P, for example Quarter converted to Annual. |
A:UK:EMP:FA(P) | A:UK:EMP:2007 (base period=2008-Q3) |
Compound expressions
From Fusion Registry v11.5.1 and later it is possible to cast a period more the once in the same expression, for example
Q(FA(P-1))-1
If the input time period P=2019-Q4, then P-1 = 2019-Q3, the expression is rewritten as Q(FA(2019-Q3))-1. The Full Annual (FA) for 2019-Q3 is 2018, as the end of period 2019 does not match the end of period 2019-Q3, so the annual period moves back an additional period. The expression is rewritten as Q(2018)-1. 2018 to Quarterly moves to 2018-Q4 and then the last part of the expression steps back 1 period, the final period is 2018-Q3.
Notes on Time Period
When converting from a lower frequency to a higher frequency (Weekly to Daily for example), the cast date always resolves to the end of the Period.
It is also important to note that as the Observation Keys in these examples all have Frequency as the first Dimension, when converting to another frequency (Quarterly to Annual for example) the Observation Key changes the value for the Frequency Dimension, ensuring the correct series of data is retrieved.
Time Period variable in free text
When refering to the Time Period as a variable in free text, for example in the table heading or a column heading, the variable is placed inside the variable syntax $(). For example $(P) would resolve to base period, and $(A(P)) would resolve to 'base period converted to annual frequency'. An example usage is shown below:
Table 1A showing data for $(P) to $(P+3)