Difference between revisions of "Fusion Transformer"
(→EDI Group Identifier) |
(→EDI Group Identifier) |
||
Line 228: | Line 228: | ||
The EDI standard states that groups in an EDI file should conform to the Group Name of "Sibling" in a DSD. So transforming data with groups will result in the output stating the group name is "Sibling" | The EDI standard states that groups in an EDI file should conform to the Group Name of "Sibling" in a DSD. So transforming data with groups will result in the output stating the group name is "Sibling" | ||
− | To override this behavious use the argument: -edi_group_identifier. This requires an argument which can either be an asterisk (*) or a specific group name. Specifying an asterisk instructs the Transformer to use the | + | To override this behavious use the argument: -edi_group_identifier. This requires an argument which can either be an asterisk (*) or a specific group name. Specifying an asterisk instructs the Transformer to use the group name as defined in the DSD. |
''Example - instructs the Transformer to output the group name as specified by the DSD'' | ''Example - instructs the Transformer to output the group name as specified by the DSD'' |
Revision as of 10:30, 12 May 2022
Contents
Overview
The Fusion Transformer is a command line application providing transformations between supported data files.
The following structure file formats are supported:
• SDMX-ML Structure 1.0
• SDMX-ML Structure 2.0
• SDMX-ML Structure 2.1
• EDI
• SDMX-JSON
The following data file formats are supported:
• SDMX- Generic 1.0
• SDMX- Generic 2.0
• SDMX- Generic 2.1
• SDMX- Compact 1.0
• SDMX- Compact 2.0
• SDMX- StructureSpecific 2.1
• EDI
• SDMX-JSON
• SDMX-CSV
Structure Transformation
The Structure Transformer can be run by executing the command:
java -cp FusionTransformer.jar io.sdmx.fusion.dataparser.StructureParseMain
Note: on releases prior to version 2.2, the command is:
java -cp FusionTransformer.jar org.bis.fusion.dataparser.StructureParseMain
For convenience there is a structureTransform.bat file provided that Windows user can use to launch the main class.
Example usage:
structureTransform.bat -o ediStructreOut21.edi -s StructureOut21.xml -v edi
For UNIX users there is an equivalent file: structureTransform.sh.
The following additional arguments are available:
Argument | Mandatory | Description | Allowed Arguments |
---|---|---|---|
-edi_lenient | False | Puts the Transformer into EDI Lenient mode. | -No Arguments- |
-o <arg> | True | Output file. | |
-pretty_print | False | Outputs the SDMX-ML structures in a more readable format. | -No Arguments- |
-s <arg> | True | URI of structure file to transform. If this option is not specified or the argument is - then input is taken from Standard Input. |
If this option is not specified or the argument is - then output is sent to Standard Output. || |
-ug | False | If present will ‘upgrade group’ attributes in a SDMX v2.0 DSD to become a Dimension Group Attribute in v2.1. The Group will still be present in the DSD so the v2.1 Schema is backwards compatible with v2.0 (allowing data to be submitted as either a Group or Series level attribute). | -No Arguments- |
-v <arg> | True | The output version. | edi / edi-lenient/ 1.0 / 2.0 / 2.1 |
There are several options for the output version. Supplying the argument 1.0, 2.0 or 2.1 will result in the creation of an SDMX-ML file in the specified format. Supplying the argument EDI will create an EDI file if possible.
The output version argument “edi-lenient” will put Fusion Transformer into EDI Lenient mode.
Please refer to the Section EDI Leniency for more information regarding this feature.
Data Transformation
The Data Transformer can be run by executing the command:
java -cp FusionTransformer.jar io.sdmx.fusion.dataparser.DataParserMain
Note: on releases prior to version 2.2, the command is:
java -cp FusionTransformer.jar org.bis.fusion.dataparser.DataParserMain
For convenience there is a dataTransform.bat file provided that Windows users can use to launch the main class.
Example usage:
dataTransform.bat -d TestData/inputData.ges -s TestData/inputDSD.ges -o output.xml -f compact
For UNIX users there is an equivalent file: dataTransform.sh.
The following additional arguments are available:
Argument | Mandatory | Description | Allowed Arguments |
---|---|---|---|
-d <arg> | True | URI of data file to transform. If this option is not specified or the argument is - then input is taken from Standard Input. | |
-f <arg> | True | Output data format. | compact / generic /edi |
-o <arg> | True | Output file. If this option is not specified or the argument is - then output is sent to Standard Output. | |
-s <arg> | True | URI of structure file with DSD. | |
-v <arg> | True | Output SDMX Version if appropriate. | 1.0 / 2.0 / 2.1 |
-split | False | Output SDMX Version if appropriate. | -No Arguments- |
-edi_group_identifier | False | Allows setting the group id when reading an EDI data file. | Either * or the group name |
-edi_lenient | False | Puts the Transformer into EDI Lenient mode. | -No Arguments- |
-edi_tf_attr | False | Creates Time Format Series Attribute from edi input. | -No Arguments- |
-pretty_print | False | Outputs the SDMX-ML structures in a clearer fashion. | -No Arguments- |
-unify_monthly | False | Outputs SDMX-ML monthly data in the format YYYY-Mmm. | -No Arguments- |
-character_mapping | False | Enables character mapping mode. | -No Arguments- |
File Formats and Character Encoding
The Fusion Transformer expects that all files supplied to it are encoded using the charset UTF-8, except for EDI files which are expected to be encoded using the charset ISO-8995-1. If you supply a file that is not explicitly encoded to the appropriate charset, the Fusion Transformer will make the assumption that it is encoded in the expected charset and attempt to decipher it using that charset.
All of the files generated by the Fusion Transformer will be encoded using the charset UTF-8, except for EDI files which will be encoded using the charset ISO-8995-1.
If you experience the Fusion Transformer behaving strangely with certain characters, please check the encoding of your input files using an appropriate software tool.
Character Mapping
When the Fusion Transformer works on a data file, it is possible to map individual characters to alternative characters or a String. This may be useful when trying to get a simplified output (for example converting the character ‘ to the character ' ) or for trying to avoid XML characters being encoded (for example replacing the character “ with the character ' would prevent the quotes character being output as " )
To enable character mapping the following conditions must be met:
• The data transform operation must be supplied with the argument -character_mapping
• A file called “character_mapping.properties” must exist in the directory where Fusion Transformer is running from.
• This file must conform to the standard Java property file format. Each entry in the file has a key and a value separated by the equals sign. Each key may only be a single character long, whereas the value may be any length (including zero length to exclude certain characters from the output).
For performance reasons, Character Mapping only affects attribute fields. This is because otherwise the semantic meaning of the file would be lost.
An example of a legal character mapping file:
# Example of a Fusion Transformer Character Mapping File
# The effect of this file is that
# ; characters will be removed from the output
# the " character will be replaced with the ' character
# the @ character will be replaced by the phrase "at symbol"
;=
"='
@=at symbol
# EOF
Time Format Series Attribute Creation
By default, reading an EDI data file does not convert the TIME_FORMAT into an attribute in the output data file, however an option exists in Fusion Transformer to perform this. To enable this behaviour the flag “edi_tf_attr” must be passed to the Fusion Transformer and the Data Structure must have a series attribute with the id of “TIME_FORMAT”.
If both of these conditions are met, then the EDI values are converted in the following manner:
Time Format | EDI Value | ISO Format |
---|---|---|
Yearly | 602 / 702 | P1Y |
Half Yearly | 604 / 704 | P6M |
Quarterly | 608 / 708 | P3M |
Monthly | 610 / 710 | P1M |
Weekly | 616 / 716 | P1W |
Daily | 101 / 102 | P1D |
Minutely | 201 / 203 | PT1M |
EDI Leniency
Fusion Transformer may be run in “EDI Leniency” mode. When running in this mode certain aspects of EDI which are enforced by the EDI standard are ignored.
Structures
EDI does not permit a structure to reference another structure belonging to a different Agency. When running in EDI Leniency mode, this restriction is disabled and the structures will all be stated as belonging to the same agency. For example, a Data Structure Definition in Agency 1 if it refers to a Codelist in Agency 2 cannot strictly be converted into EDI since EDI cannot express that the DSD and Codelist are in different agencies. In EDI Leniency mode, the outputted EDI file would state that both are owned by Agency1.
The EDI standard does not support uncoded dimensions in a Data Structure. All dimensions must be explicitly coded. When running in EDI Leniency mode, Data Structures with uncoded dimensions may be read or written without an exception being thrown.
In EDI, a Codelist can only have an ID and code IDs of up to 18 characters in length. A concept scheme may only have concept IDs up to 18 characters in length. EDI Leniency mode disregards these restrictions.
Data
EDI enforces the following limits on a data file:
• The Observation Value is limited to a maximum of 15 characters
• The Observation Attribute “OBS_STATUS” is limited to a maximum of 35 characters
• The Observation Attribute “OBS_CONF” is limited to a maximum of 35 characters
• The Observation Attribute “OBS_PRE_BREAK” is limited to a maximum of 15 characters
When running in EDI Leniency mode, none of the above restrictions apply when reading or writing a data file.
Enabling EDI Leniency Mode
When running the structure transformer, either supply the output version (the -v flag) as “edi_lenient” (also edi-lenient is supported for backwards-compatibility) or supply the parameter: -edi_lenient.
Example
structureTransform.bat -o ediStructureOut21.edi -s StructureOut21.xml -v edi_lenient structureTransform.bat -o ediStructureOut21.edi -s StructureOut21.xml -v edi -edi_lenient
When running the data transformer, supply the parameter -edi_lenient.
Example
dataTransform.bat -d ediData.ges -s Structures.xml -f generic -edi_lenient -o genericData.xml
EDI Group Identifier
The EDI standard states that groups in an EDI file should conform to the Group Name of "Sibling" in a DSD. So transforming data with groups will result in the output stating the group name is "Sibling"
To override this behavious use the argument: -edi_group_identifier. This requires an argument which can either be an asterisk (*) or a specific group name. Specifying an asterisk instructs the Transformer to use the group name as defined in the DSD.
Example - instructs the Transformer to output the group name as specified by the DSD
dataTransform.bat -d ediData.ges -s Structures.xml -f generic -edi_group_identifier "*"
Example - instructs the Transformer to output the group name as supplied. In this case "NewGroupName"
dataTransform.bat -d ediData.ges -s Structures.xml -f generic -edi_group_identifier NewGroupName
Return Codes
The following return codes are returned by the applications.
Structure Transformer
Return Code | Meaning |
---|---|
0 | Application completed successfully. |
1 | No arguments specified. |
2 | Illegal argument specified. |
4 | Structure file could not be found. |
6 | Illegal output data format specified. |
7 | Cannot write to output file. |
10 | Error during transformation. |
Data Transformer
Return Code | Meaning |
---|---|
0 | Application completed successfully. |
1 | No arguments specified. |
2 | Illegal argument specified. |
3 | Data File could not be found. |
4 | Structure File could not be found. |
5 | Specified schema version incorrect. |
6 | Illegal output data format specified. |
7 | Cannot write to output file. |
10 | Error during transformation. |
11 | Character Mapping option specified but file could not be found |
12 | Error reading character mapping file |
13 | Character Mapping file has empty key – this is not permitted |
14 | Character Mapping file has key longer than a single character – this is not permitted |
In the windows command prompt you can use the command
echo %ERRORLEVEL%
This will display the Return Code.