Difference between revisions of "SDMX-CSV"
(→Version 1.0 and Version 2.0) |
|||
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | see [https://wiki.sdmxcloud.org/Category:SdmxDataFormat Data Formats] | ||
[[Category:SdmxDataFormat]] | [[Category:SdmxDataFormat]] | ||
− | |||
=Overview= | =Overview= | ||
− | The SDMX-CSV Data format is an official SDMX format described on the [https://github.com/sdmx-twg/sdmx-csv/tree/master/data-message SDMX GitHub Repository]. | + | The SDMX-CSV Data format is an official SDMX format described on the [https://github.com/sdmx-twg/sdmx-csv/tree/master/data-message SDMX GitHub Repository]. There are currently two major versions and if in doubt it is recommended to use the latest version, version 2. |
It can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser. | It can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser. | ||
− | The format is comma separated format where each row describes a single Observation value. The columns describe each component of the Observation value, reflecting the Components in the [[Data_Structure_Definition#Data_Structure_Components|Data Structure Definition]]. | + | The format is comma-separated format where each row describes a single Observation value. The columns describe each component of the Observation value, reflecting the Components in the [[Data_Structure_Definition#Data_Structure_Components|Data Structure Definition]]. |
+ | |||
+ | The dataset can contain both Code labels, in addition to the Code IDs. The order of the columns is not important, however each column header will contain the ID of the corresponding Component. | ||
+ | |||
+ | == Version 1.0 and Version 2.0 == | ||
+ | There are a number of differences between version 1.0 and version 2.0 of the standard but the most apparent is the identifier against the target Data Structure. | ||
+ | |||
+ | Version 1.0 only supports Datasets against a Dataflow. The first column must be called '''Dataflow''' and each row of the dataset must contain the [[URN]] postfix of the Dataflow that dataset is for. | ||
+ | |||
+ | Version 2.0 has a different header where the first 2 "columns" are "STRUCTURE" and "STRUCTURE_ID". The Structure column can refer to a DSD, Dataflow or Provision Agreement and the STRUCTURE_ID is the [[URN]] postfix of the metadata structure. | ||
+ | |||
+ | This can be demonstrated by example. The following are queries for the same data but with different format versions specified: | ||
+ | |||
+ | * SDMX-CSV V2: https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&format=sdmx-csv-2.0.0 | ||
+ | * SDMX-CSV V1: https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&format=sdmx-csv-1.0.0 | ||
+ | |||
+ | =Formatting Using Query Parameters= | ||
+ | The following URL parameters can be used in a RESTful query for Fusion-CSV data. | ||
+ | * format = sdmx-csv | ||
+ | * labels = id | name | both | (id is default) | ||
+ | * timeFormat = normalized | ||
+ | * bom = include | exclude (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file) | ||
+ | |||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Query Parameter !! Values !! Description | ||
+ | |- | ||
+ | | format || sdmx-csv || Required to output the dataset in csv-ts format | ||
+ | |- | ||
+ | | labels || '''id''' or '''name''' or '''both''' <br/> default id || Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label) | ||
+ | |- | ||
+ | | timeFormat || normalized || outputs the time in the highest-frequency. E.g if annual data is supplied, but the highest frequency is daily, then 2001 would become either 2001-01-01 or 2001-12-31 depending on whether startPeriod or endPeriod is used. | ||
+ | |- | ||
+ | | serieskey || '''include ''' or '''exclude''' <br/> default exclude || If true, a series key column will be included in the output | ||
+ | |- | ||
+ | | bom || '''include ''' or '''exclude''' || (Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file) | ||
+ | |} | ||
+ | |||
+ | |||
+ | The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/ | ||
+ | |||
+ | '''Normalized Time''' | ||
+ | If the parameter value is normalized then the TIME_PERIOD values are converted to the most granular ISO 8601 representation taking into account the highest frequency of the data in the message | ||
+ | |||
+ | <strong>Example</strong> | ||
+ | <i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=csv&labels=both&delimiter=tab</i> | ||
− | The | + | <strong>Note:</strong> The same formatting can be applied using [[Data_Formats|HTTP Accept Headers]] as opposed to query parameters. |
=Example= | =Example= | ||
Line 14: | Line 60: | ||
<i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=sdmx-csv&labels=both</i> | <i>https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=sdmx-csv&labels=both</i> | ||
− | An example dataset with IDs only, spaces have been added to this example to assist readability. | + | An example dataset with IDs only, spaces have been added to this example to assist readability. Note this is version 2.0 of the SDMX CSV format: |
<pre> | <pre> | ||
− | + | STRUCTURE, STRUCTURE_ID, REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE | |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2008, 102 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2008, 102 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2009, 114 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2009, 114 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2010, 114 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2010, 114 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2011, 114 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2011, 114 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2012, 103 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2012, 103 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2013, 114 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2013, 114 |
− | WB:GCI(1.0), GHA, GCI, RANK, A, 2014, 111 | + | dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2014, 111 |
</pre> | </pre> | ||
− | The same dataset in SDMX-CSV with | + | The same dataset in SDMX-CSV with labels included. |
<pre> | <pre> | ||
− | + | STRUCTURE, STRUCTURE_ID, REF_AREA:Reference Area, INDICATOR:Indicator, SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation | |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2008, 102 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2008, 102 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2009, 114 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2009, 114 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2010, 114 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2010, 114 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2011, 114 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2011, 114 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2012, 103 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2012, 103 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2013, 114 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2013, 114 |
− | WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2014, 111 | + | dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2014, 111 |
</pre> | </pre> |
Latest revision as of 02:11, 10 October 2023
see Data Formats
Overview
The SDMX-CSV Data format is an official SDMX format described on the SDMX GitHub Repository. There are currently two major versions and if in doubt it is recommended to use the latest version, version 2.
It can be used as both an import and export format for the Fusion Registry, and an export format for the Fusion Edge Server and Fusion Data Browser.
The format is comma-separated format where each row describes a single Observation value. The columns describe each component of the Observation value, reflecting the Components in the Data Structure Definition.
The dataset can contain both Code labels, in addition to the Code IDs. The order of the columns is not important, however each column header will contain the ID of the corresponding Component.
Version 1.0 and Version 2.0
There are a number of differences between version 1.0 and version 2.0 of the standard but the most apparent is the identifier against the target Data Structure.
Version 1.0 only supports Datasets against a Dataflow. The first column must be called Dataflow and each row of the dataset must contain the URN postfix of the Dataflow that dataset is for.
Version 2.0 has a different header where the first 2 "columns" are "STRUCTURE" and "STRUCTURE_ID". The Structure column can refer to a DSD, Dataflow or Provision Agreement and the STRUCTURE_ID is the URN postfix of the metadata structure.
This can be demonstrated by example. The following are queries for the same data but with different format versions specified:
- SDMX-CSV V2: https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&format=sdmx-csv-2.0.0
- SDMX-CSV V1: https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2/data/dataflow/WB/GCI/1.0/?c%5BREF_AREA%5D=GHA&c%5BINDICATOR%5D=GCI&format=sdmx-csv-1.0.0
Formatting Using Query Parameters
The following URL parameters can be used in a RESTful query for Fusion-CSV data.
- format = sdmx-csv
- labels = id | name | both | (id is default)
- timeFormat = normalized
- bom = include | exclude (Include or Exclude the Byte Order Mark (BOM).
The BOM helps Excel interpret non Latin characters when opening a CSV file)
Query Parameter | Values | Description |
---|---|---|
format | sdmx-csv | Required to output the dataset in csv-ts format |
labels | id or name or both default id |
Defines if labels or ids should be used on output, if both is selected then there are 2 columns per coded component (one for Id one for the label) |
timeFormat | normalized | outputs the time in the highest-frequency. E.g if annual data is supplied, but the highest frequency is daily, then 2001 would become either 2001-01-01 or 2001-12-31 depending on whether startPeriod or endPeriod is used. |
serieskey | include or exclude default exclude |
If true, a series key column will be included in the output |
bom | include or exclude | (Include or Exclude the Byte Order Mark (BOM). The BOM helps Excel interpret non Latin characters when opening a CSV file) |
The output type "all-lang" will output both id and name and will add an element to the start of each row which will be the language. For each series and each language a row will be output where the name in the row is for the specified language. If there is no name value for that language, the name is simply not output. See below for an example output/
Normalized Time If the parameter value is normalized then the TIME_PERIOD values are converted to the most granular ISO 8601 representation taking into account the highest frequency of the data in the message
Note: The same formatting can be applied using HTTP Accept Headers as opposed to query parameters.
Example
An example query using the format request parameters, HTTP Accept Headers can also be used to define the same format.
https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest/data/WB,GCI,1.0/GHA.GCI..?format=sdmx-csv&labels=both
An example dataset with IDs only, spaces have been added to this example to assist readability. Note this is version 2.0 of the SDMX CSV format:
STRUCTURE, STRUCTURE_ID, REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2008, 102 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2009, 114 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2010, 114 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2011, 114 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2012, 103 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2013, 114 dataflow, WB:GCI(1.0), GHA, GCI, RANK, A, 2014, 111
The same dataset in SDMX-CSV with labels included.
STRUCTURE, STRUCTURE_ID, REF_AREA:Reference Area, INDICATOR:Indicator, SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2008, 102 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2009, 114 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2010, 114 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2011, 114 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2012, 103 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2013, 114 dataflow, WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2014, 111