Difference between revisions of "Fusion-CSV"
(→Overview) |
|||
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | see [https://wiki.sdmxcloud.org/Category:SdmxDataFormat Data Formats] | ||
[[Category:SdmxDataFormat]] | [[Category:SdmxDataFormat]] | ||
− | |||
=Overview= | =Overview= | ||
The Fusion-CSV Data format predated [[SDMX-CSV]] and is not an official SDMX format. It is very similar in that the format is comma separated format where each row describes a single Observation value. The differences are: | The Fusion-CSV Data format predated [[SDMX-CSV]] and is not an official SDMX format. It is very similar in that the format is comma separated format where each row describes a single Observation value. The differences are: | ||
Line 7: | Line 7: | ||
# There is no column for the Dataflow, in SDMX-CSV the first column must be for the Dataflow | # There is no column for the Dataflow, in SDMX-CSV the first column must be for the Dataflow | ||
# The order of the columns is important, unlike [[SDMX-CSV]] where labels are in the same column as the Id concatenated with a colon | # The order of the columns is important, unlike [[SDMX-CSV]] where labels are in the same column as the Id concatenated with a colon | ||
− | # Fusion-CSV outputs | + | # Fusion-CSV outputs labels in their own columns as oppose to [[SDMX-CSV]] which outputs a id:label pair in a single column concatenated with a semicolon (e.g. A:Annual) |
# Fusion-CSV does not output columns for data queries if they are not part of the response based on the [https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_4_data_queries.md detail parameter], for example detail=seriesKeysOnly for a data query will only write columns for the Dimensions of the Data Structure in the output CSV, as oppose to [[SDMX-CSV]] which outputs a column for all the DSD components, regardless of whether they are used. | # Fusion-CSV does not output columns for data queries if they are not part of the response based on the [https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_4_data_queries.md detail parameter], for example detail=seriesKeysOnly for a data query will only write columns for the Dimensions of the Data Structure in the output CSV, as oppose to [[SDMX-CSV]] which outputs a column for all the DSD components, regardless of whether they are used. | ||
Fusion-CSV can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser. | Fusion-CSV can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser. | ||
− | =Example= | + | == Ordering of columns == |
+ | Fusion-CSV does not have the same input and output format (which gives another reason to use [[SDMX-CSV]]). When Fusion-CSV is output, the columns are output in the following order: | ||
+ | |||
+ | * Dimensions | ||
+ | * Dataset Attributes | ||
+ | * Group Attributes | ||
+ | * Series Attributes | ||
+ | * TIME_FORMAT | ||
+ | * OBS_VALUE | ||
+ | * Observation Attributes | ||
+ | |||
+ | When reading a Fusion-CSV file the order of the columns is: | ||
+ | * Dimensions | ||
+ | * TIME_FORMAT | ||
+ | * Dataset Attributes | ||
+ | * Series Attributes | ||
+ | * Observation Attributes | ||
+ | * OBS_VALUE | ||
+ | |||
+ | Group Attributes are not supported when reading Fusion-CSV. | ||
+ | |||
+ | =Formatting Using Query Parameters= | ||
+ | The following URL parameters can be used in a RESTful query for Fusion-CSV data. | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Query Parameter !! Values !! Description | ||
+ | |- | ||
+ | | format || csv || Required to output the dataset in csv-ts format | ||
+ | |- | ||
+ | | delimiter || comma | tab | semicolon | space <br/> default is comma || Defines which delimiter to use | ||
+ | |- | ||
+ | | labels || '''id''' or '''name''' or '''both''' <br/> default id || Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label) | ||
+ | |- | ||
+ | | serieskey || '''include ''' or '''exclude''' <br/> default exclude || If true, a series key column will be included in the output | ||
+ | |- | ||
+ | | bom || '''include ''' or '''exclude''' || (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file) | ||
+ | |} | ||
+ | |||
+ | The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/ | ||
+ | |||
+ | <strong>Example</strong> | ||
+ | <i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both&delimiter=tab</i> | ||
+ | |||
+ | <strong>Note:</strong> The same formatting can be applied using [[Data_Formats|HTTP Accept Headers]] as opposed to query parameters. | ||
+ | |||
+ | =Example Output= | ||
An example query using the format request parameters, [[Data_Formats|HTTP Accept Headers]] can also be used to define the same format.<br/> | An example query using the format request parameters, [[Data_Formats|HTTP Accept Headers]] can also be used to define the same format.<br/> | ||
<i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both</i> | <i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both</i> | ||
Line 29: | Line 75: | ||
</pre> | </pre> | ||
− | The same dataset in Fusion-CSV with | + | The same dataset in Fusion-CSV with labels included. Note: labels columns are only included if the Dimension, Attribute, or Measure is Coded, if it is not, then only one column is output - this can be seen in the table below where both TIME_PERIOD and OBS_VALUE are only single columns. |
<pre> | <pre> | ||
REF_AREA, Reference Area, INDICATOR, Indicator, SUB_INDICATOR, Sub Indicator, FREQ, Frequency, TIME_PERIOD, OBS_VALUE | REF_AREA, Reference Area, INDICATOR, Indicator, SUB_INDICATOR, Sub Indicator, FREQ, Frequency, TIME_PERIOD, OBS_VALUE | ||
Line 39: | Line 85: | ||
GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2013, 114 | GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2013, 114 | ||
GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2014, 111 | GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2014, 111 | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | An example dataset with the returned detail set to series keys only - fewer columns are written to the response.<br/> | ||
+ | <i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&detail=serieskeysonly</i> | ||
+ | <pre> | ||
+ | REF_AREA, INDICATOR, SUB_INDICATOR, FREQ | ||
+ | GHA, GCI, RANK, A | ||
+ | GHA, GCI, VALUE, A | ||
+ | </pre> | ||
+ | |||
+ | The same dataset in Fusion-CSV with the label type specified as all-lang. Note how 6 lines below only represent 1 series (GHA:GCI:RANK:A) and 2 observations 2008 and 2009. This example demonstrates where there are 3 languages in use (en, fr and de), that there is a French name for GCI, but not a German name and that RANK has a name in en, fr and de. | ||
+ | <pre> | ||
+ | LANG, REF_AREA, Reference Area, INDICATOR, Indicator, SUB_INDICATOR, Sub Indicator, FREQ, Frequency, TIME_PERIOD, OBS_VALUE | ||
+ | en, GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2008, 102 | ||
+ | fr, GHA, Ghana, GCI, Indice de compétitivité mondiale, RANK, Rang, A, Annual, 2008, 102 | ||
+ | de, GHA, Ghana, GCI, , RANK, Rang (de), A, Annual, 2008, 102 | ||
+ | en, GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2009, 114 | ||
+ | fr, GHA, Ghana, GCI, Indice de compétitivité mondiale, RANK, Rang, A, Annual, 2009, 114 | ||
+ | de, GHA, Ghana, GCI, , RANK, Rang (de), A, Annual, 2009, 114 | ||
</pre> | </pre> |
Latest revision as of 00:49, 29 September 2022
see Data Formats
Overview
The Fusion-CSV Data format predated SDMX-CSV and is not an official SDMX format. It is very similar in that the format is comma separated format where each row describes a single Observation value. The differences are:
- Whilst there can be a header row it is not mandatory
- There is no column for the Dataflow, in SDMX-CSV the first column must be for the Dataflow
- The order of the columns is important, unlike SDMX-CSV where labels are in the same column as the Id concatenated with a colon
- Fusion-CSV outputs labels in their own columns as oppose to SDMX-CSV which outputs a id:label pair in a single column concatenated with a semicolon (e.g. A:Annual)
- Fusion-CSV does not output columns for data queries if they are not part of the response based on the detail parameter, for example detail=seriesKeysOnly for a data query will only write columns for the Dimensions of the Data Structure in the output CSV, as oppose to SDMX-CSV which outputs a column for all the DSD components, regardless of whether they are used.
Fusion-CSV can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser.
Ordering of columns
Fusion-CSV does not have the same input and output format (which gives another reason to use SDMX-CSV). When Fusion-CSV is output, the columns are output in the following order:
- Dimensions
- Dataset Attributes
- Group Attributes
- Series Attributes
- TIME_FORMAT
- OBS_VALUE
- Observation Attributes
When reading a Fusion-CSV file the order of the columns is:
- Dimensions
- TIME_FORMAT
- Dataset Attributes
- Series Attributes
- Observation Attributes
- OBS_VALUE
Group Attributes are not supported when reading Fusion-CSV.
Formatting Using Query Parameters
The following URL parameters can be used in a RESTful query for Fusion-CSV data.
Query Parameter | Values | Description |
---|---|---|
format | csv | Required to output the dataset in csv-ts format |
delimiter | tab | semicolon | space default is comma |
Defines which delimiter to use |
labels | id or name or both default id |
Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label) |
serieskey | include or exclude default exclude |
If true, a series key column will be included in the output |
bom | include or exclude | (Include or Exclude the Byte Order Mark (BOM). The BOM helps Excel interpret non Latin characters when opening a CSV file) |
The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/
Note: The same formatting can be applied using HTTP Accept Headers as opposed to query parameters.
Example Output
An example query using the format request parameters, HTTP Accept Headers can also be used to define the same format.
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both
An example dataset with IDs only, spaces have been added to this example to assist readability.
REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE GHA, GCI, RANK, A, 2008, 102 GHA, GCI, RANK, A, 2009, 114 GHA, GCI, RANK, A, 2010, 114 GHA, GCI, RANK, A, 2011, 114 GHA, GCI, RANK, A, 2012, 103 GHA, GCI, RANK, A, 2013, 114 GHA, GCI, RANK, A, 2014, 111
The same dataset in Fusion-CSV with labels included. Note: labels columns are only included if the Dimension, Attribute, or Measure is Coded, if it is not, then only one column is output - this can be seen in the table below where both TIME_PERIOD and OBS_VALUE are only single columns.
REF_AREA, Reference Area, INDICATOR, Indicator, SUB_INDICATOR, Sub Indicator, FREQ, Frequency, TIME_PERIOD, OBS_VALUE GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2008, 102 GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2009, 114 GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2010, 114 GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2011, 114 GHA, Ghana, GCI, Global Competitiveness Index, RANK: Rank, A, Annual, 2012, 103 GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2013, 114 GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2014, 111
An example dataset with the returned detail set to series keys only - fewer columns are written to the response.
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&detail=serieskeysonly
REF_AREA, INDICATOR, SUB_INDICATOR, FREQ GHA, GCI, RANK, A GHA, GCI, VALUE, A
The same dataset in Fusion-CSV with the label type specified as all-lang. Note how 6 lines below only represent 1 series (GHA:GCI:RANK:A) and 2 observations 2008 and 2009. This example demonstrates where there are 3 languages in use (en, fr and de), that there is a French name for GCI, but not a German name and that RANK has a name in en, fr and de.
LANG, REF_AREA, Reference Area, INDICATOR, Indicator, SUB_INDICATOR, Sub Indicator, FREQ, Frequency, TIME_PERIOD, OBS_VALUE en, GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2008, 102 fr, GHA, Ghana, GCI, Indice de compétitivité mondiale, RANK, Rang, A, Annual, 2008, 102 de, GHA, Ghana, GCI, , RANK, Rang (de), A, Annual, 2008, 102 en, GHA, Ghana, GCI, Global Competitiveness Index, RANK, Rank, A, Annual, 2009, 114 fr, GHA, Ghana, GCI, Indice de compétitivité mondiale, RANK, Rang, A, Annual, 2009, 114 de, GHA, Ghana, GCI, , RANK, Rang (de), A, Annual, 2009, 114