Difference between revisions of "Data Formats"

From Fusion Registry Wiki
Jump to navigation Jump to search
(CSV Formats)
(CSV Formats)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category:SdmxDataFormat|Data Formats]]
 
= Overview =  
 
= Overview =  
 
Fusion Registry accepts and outputs datasets in a number of formats.  When consuming data, the Fusion Registry will analyse the dataset to try to determine what data format it has received, so that it is able to direct it to the right reader.  All datasets are read by the Fusion Registry in excatly the same way, so any data processing performed on a Dataset is the same, regardless of the input Data Format.     
 
Fusion Registry accepts and outputs datasets in a number of formats.  When consuming data, the Fusion Registry will analyse the dataset to try to determine what data format it has received, so that it is able to direct it to the right reader.  All datasets are read by the Fusion Registry in excatly the same way, so any data processing performed on a Dataset is the same, regardless of the input Data Format.     
Line 10: Line 11:
 
SDMX Fomats are supported as described by the [https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_6_conneg.md web services specification]
 
SDMX Fomats are supported as described by the [https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_6_conneg.md web services specification]
  
 +
Accept Headers
 +
{| class="wikitable"
 +
|-
 +
! Accept Header !! Format
 +
|-
 +
| application/vnd.sdmx.data+json;version=1.0.0|| [[SDMX-JSON Data|SDMX JSON]]
 +
|-
 +
| application/vnd.sdmx.data+csv;version=1.0.0|| [[SDMX-CSV|SDMX CSV]]
 +
|-
 +
| application/vnd.sdmx.data+edi|| [[SDMX-EDI Data|SDMX EDI]]
 +
|-
 +
| application/vnd.sdmx.structurespecificdata+xml;version=2.1|| [[SDMX-ML Structure Specific Data|Structure Specific]] (2.1)
 +
|-
 +
| application/vnd.sdmx.structurespecificdata+xml;version=2.0|| Compact (2.0/1.0)
 +
|-
 +
| application/vnd.sdmx.genericdata+xml;version=2.1|| [[SDMX-ML Generic Data|Generic]] (2.1/2.0/1.0)
 +
|}
  
 
==CSV Formats==
 
==CSV Formats==
There are a number of CSV 'flavous' supported by Fusion Software, including the [[SDMX-CSV]] format which is an official SDMX data format.
+
There are a number of CSV 'flavours' supported by Fusion Software, including the [[SDMX-CSV]] format which is an official SDMX data format. It is recommended to use the SDMX CSV format version 2 if you cannot decide which format to use.
  
The following VND Headers are supported  
+
<p>The following Formats and correspoding VND Headers are supported: </p>
[[SDMX-CSV]]  application/vnd.sdmx.data+csv  
+
* [[SDMX-CSV]]  application/vnd.sdmx.data+csv  
[[Fusion-CSV]] application/vnd.csv
+
* [[Fusion-CSV]] application/vnd.csv
[[Fusion-CSV-TS]] application/vnd.csv-ts
+
* [[Fusion-CSV-TS]] application/vnd.csv-ts
  
Each VND Header can then take the additional arguments of:
+
<p>Each VND Header can then take the additional arguments of, note SDMX-CSV only supports a subset of these arguments or supported values.</p>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Argument !! Description !! Supported Values !! Default
+
! Argument !! Description !! Supported Values !! Default !! Supported Formats
 +
|-
 +
| version || the version of the format|| 1.0.0, 2.0.0 [[SDMX-CSV]] only || 1.0.0 || [[SDMX-CSV]], [[Fusion-CSV]], [[Fusion-CSV-TS]]
 +
|-
 +
| timeFormat ||  values are converted to the most granular ISO 8601 representation <br/>taking into account the highest frequency of the data in the message|| original or normalized || original  || [[SDMX-CSV]]
 +
|-
 +
| labels || output both code/concept ids and the respective labels <br/> in the specified language || both/id/name || id || [[SDMX-CSV]] (with the exception of labels=name),<br/> [[Fusion-CSV]], [[Fusion-CSV-TS]]
 +
|-
 +
| delimiter || the delimiter to use || comma/tab/semicolon/space || comma || [[Fusion-CSV]], [[Fusion-CSV-TS]]
 
|-
 
|-
| version|| the version of the format|| 1.0.0 || 1.0.0
+
| serieskey || include the series key as a column<br/> A series key is the concatenation of the dimension values<br/>for example A:UK:EMPLOYMENT || include/exclude || exclude || [[Fusion-CSV]], [[Fusion-CSV-TS]]
 
|-
 
|-
| timeFormat|| values are converted to the most granular ISO 8601 representation <br/>taking into account the highest frequency of the data in the message|| original or normalized || original
+
| bom <br/> <small>(since 10.3.1)</small> || Include or Exclude the [https://en.wikipedia.org/wiki/Byte_order_mark '''B'''yte '''O'''rder '''M'''ark] (BOM).<br/> The BOM helps Excel interpret non Latin characters when opening a CSV file || include/exclude || exclude || [[SDMX-CSV]], [[Fusion-CSV]], [[Fusion-CSV-TS]]
 
|-
 
|-
| labels || output both code/concept ids and the respective labels <br/> in the specified language || both or id || id
 
 
|}
 
|}
* version - the version of the format
 
* normalised time - true/false to indicate wether to normalise time
 
* labels - id|both  whether to output only code Ids or to include the code labels
 
  
Note: The Labels parameter can be used in conjuntion with the [https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4 HTTP Accept-Language] Header to indicate which language to resolve the labels in.  If the labels are not available in the requested language, another language will be selected, defaulting to English.
+
 
 +
<p><strong>Note:</strong>The Labels parameter can be used in conjuntion with the [https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4 HTTP Accept-Language] Header to indicate which language to resolve the labels in.  If the labels are not available in the requested language, another language will be selected, defaulting to English. </p>
  
 
===Examples===
 
===Examples===
application/vnd.sdmx.data+csv
+
application/vnd.sdmx.data+csv <br/>
application/vnd.sdmx.data+csv;version=1.0.0;
+
application/vnd.sdmx.data+csv;version=1.0.0;<br/>
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized
+
application/vnd.sdmx.data+csv;version=2.0.0;<br/>
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized;labels=both
+
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized<br/>
 
+
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized;labels=both<br/>
application/vnd.csv
+
<p/>
application/vnd.csv;version=1.0.0;
+
application/vnd.csv<br/>
application/vnd.csv;version=1.0.0;timeFormat=normalized
+
application/vnd.csv;delimiter=tab<br/>
application/vnd.csv;version=1.0.0;timeFormat=normalized;labels=both
+
application/vnd.csv;timeFormat=normalized;serieskey=include<br/>
 
+
application/vnd.csv;version=1.0.0;labels=both<br/>
application/vnd.csv-ts
+
<p/>
application/vnd.csv-ts;version=1.0.0;
+
application/vnd.csv-ts<br/>
application/vnd.csv-ts;version=1.0.0;timeFormat=normalized
+
application/vnd.csv-ts;version=1.0.0;<br/>
application/vnd.csv-ts;version=1.0.0;timeFormat=normalized;labels=both
+
application/vnd.csv-ts;version=1.0.0;labels=name<br/>
  
==[[Data_Reporting_Template|Data Reporting Template]]==
+
==[[Reporting Template|Data Reporting Template]]==
  
 
There are two types of output when converting data to a Data Reporting Template format.  The first is where the Data reporting template is constructed in the usual way, with the [[Data_Reporting_Template#Defining_the_Universe_of_Data|Universe of Data]] being derived from the [[Dataflow]], and related [[Content Constraint|Content Constraints]].  The second is where the [[Data_Reporting_Template#Defining_the_Universe_of_Data|Universe of Data]]  is derived from the dataset being written into the Excel workbook.    The default output is to base the Report Template Universe on the constraints, to change this behaviour, use the '''+partial''' indicator in the VND Header.
 
There are two types of output when converting data to a Data Reporting Template format.  The first is where the Data reporting template is constructed in the usual way, with the [[Data_Reporting_Template#Defining_the_Universe_of_Data|Universe of Data]] being derived from the [[Dataflow]], and related [[Content Constraint|Content Constraints]].  The second is where the [[Data_Reporting_Template#Defining_the_Universe_of_Data|Universe of Data]]  is derived from the dataset being written into the Excel workbook.    The default output is to base the Report Template Universe on the constraints, to change this behaviour, use the '''+partial''' indicator in the VND Header.

Latest revision as of 01:59, 10 October 2023

Overview

Fusion Registry accepts and outputs datasets in a number of formats. When consuming data, the Fusion Registry will analyse the dataset to try to determine what data format it has received, so that it is able to direct it to the right reader. All datasets are read by the Fusion Registry in excatly the same way, so any data processing performed on a Dataset is the same, regardless of the input Data Format.

When querying for data from the Registry web service, or performing a Data Transformation via the Web service, the output data format is described using the HTTP Accept Header which describes the required data format.


HTTP Accept Headers

SDMX Formats

SDMX Fomats are supported as described by the web services specification

Accept Headers

Accept Header Format
application/vnd.sdmx.data+json;version=1.0.0 SDMX JSON
application/vnd.sdmx.data+csv;version=1.0.0 SDMX CSV
application/vnd.sdmx.data+edi SDMX EDI
application/vnd.sdmx.structurespecificdata+xml;version=2.1 Structure Specific (2.1)
application/vnd.sdmx.structurespecificdata+xml;version=2.0 Compact (2.0/1.0)
application/vnd.sdmx.genericdata+xml;version=2.1 Generic (2.1/2.0/1.0)

CSV Formats

There are a number of CSV 'flavours' supported by Fusion Software, including the SDMX-CSV format which is an official SDMX data format. It is recommended to use the SDMX CSV format version 2 if you cannot decide which format to use.

The following Formats and correspoding VND Headers are supported:

Each VND Header can then take the additional arguments of, note SDMX-CSV only supports a subset of these arguments or supported values.

Argument Description Supported Values Default Supported Formats
version the version of the format 1.0.0, 2.0.0 SDMX-CSV only 1.0.0 SDMX-CSV, Fusion-CSV, Fusion-CSV-TS
timeFormat values are converted to the most granular ISO 8601 representation
taking into account the highest frequency of the data in the message
original or normalized original SDMX-CSV
labels output both code/concept ids and the respective labels
in the specified language
both/id/name id SDMX-CSV (with the exception of labels=name),
Fusion-CSV, Fusion-CSV-TS
delimiter the delimiter to use comma/tab/semicolon/space comma Fusion-CSV, Fusion-CSV-TS
serieskey include the series key as a column
A series key is the concatenation of the dimension values
for example A:UK:EMPLOYMENT
include/exclude exclude Fusion-CSV, Fusion-CSV-TS
bom
(since 10.3.1)
Include or Exclude the Byte Order Mark (BOM).
The BOM helps Excel interpret non Latin characters when opening a CSV file
include/exclude exclude SDMX-CSV, Fusion-CSV, Fusion-CSV-TS


Note:The Labels parameter can be used in conjuntion with the HTTP Accept-Language Header to indicate which language to resolve the labels in. If the labels are not available in the requested language, another language will be selected, defaulting to English.

Examples

application/vnd.sdmx.data+csv
application/vnd.sdmx.data+csv;version=1.0.0;
application/vnd.sdmx.data+csv;version=2.0.0;
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized
application/vnd.sdmx.data+csv;version=1.0.0;timeFormat=normalized;labels=both

application/vnd.csv
application/vnd.csv;delimiter=tab
application/vnd.csv;timeFormat=normalized;serieskey=include
application/vnd.csv;version=1.0.0;labels=both

application/vnd.csv-ts
application/vnd.csv-ts;version=1.0.0;
application/vnd.csv-ts;version=1.0.0;labels=name

Data Reporting Template

There are two types of output when converting data to a Data Reporting Template format. The first is where the Data reporting template is constructed in the usual way, with the Universe of Data being derived from the Dataflow, and related Content Constraints. The second is where the Universe of Data is derived from the dataset being written into the Excel workbook. The default output is to base the Report Template Universe on the constraints, to change this behaviour, use the +partial indicator in the VND Header.

Accept Header Description
application/vnd.reporttemplate Excel Report Template pre-populated with the data from a dataset.


The dataset should contain the Provision Agreement reference, to enable the Fusion Registry to determine the Data Provider
The excel file will be the same as a Report Template generated via the Reporting Template Web Service, but it will be pre-populated with observation and attribute values from the dataset.

application/vnd.reporttemplate.ACY:BANKING(1.0) This is an extension of application/vnd.reporttemplate, it tells the Fusion Registry which Reporting Template to use. Only required if there is more then one Reporting Template for the Dataflow(s) being written.
application/vnd.reporttemplate;DATA_PROVIDER=ONS This is an extension of application/vnd.reporttemplate, it tells the Fusion Registry who the Data Provider is.

The Data Provider's Agency defaults to SDMX. If this is not true, use the syntax DATA_PROVIDER=ACY_ID.ONS

Can be used in conjunction with other VND arguments such as the Report Template identifer.

application/vnd.reporttemplate+partial This outputs the dataset conforming to the layout of the Report Template, but includes only the worksheets, and observation cells for which there is data in the dataset. There is no main worksheet.

A Data Provider reference is not necessary, however information about which Report Template to use can be provided using the .ACY:TEMPLATE_ID(1.0) syntax.