Difference between revisions of "Fusion Edge Compiler"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Command Line Arguments)
(Command Line Arguments)
Line 129: Line 129:
 
| lenient || -lenient || If present the pull process will skip failed data queries (they will not prevent the remaining pull to complete).  The failure will be noted in the report for the Dataset.
 
| lenient || -lenient || If present the pull process will skip failed data queries (they will not prevent the remaining pull to complete).  The failure will be noted in the report for the Dataset.
 
|-
 
|-
| metadata <br/> <small>since 4.5.0</small>  || -metadata incref || Issues a request for all metadata from the web service, will also include the Metadataflows and Metadata Structures which the Metadata Sets conform to.  The argument '''incref''' will ensure that all target structures (the structures the metadata is authored against) and their descendants are also included in the output, '''excref''' will not include the target structures. <br><br>'''Note:'''If the target structures are not present in the file system by some other means then the reference metadata will be excluded during the Compile process.  
+
| metadata <br/> <small>since 4.5.0</small>  || -metadata incref || Issues a request for all metadata from the web service, will also include the Metadataflows and Metadata Structures which the Metadata Sets conform to.  The argument '''incref''' will ensure that all target structures (the structures the metadata is authored against) and their descendants are also included in the output, '''excref''' will not include the target structures. <br><br>'''Note:''' If the target structures are not present in the file system by some other means then the reference metadata will be excluded during the Compile process.  
 
|-
 
|-
 
| replace || -replace || If present, all the files in the the target directory will be deleted before the pull content is run
 
| replace || -replace || If present, all the files in the the target directory will be deleted before the pull content is run

Revision as of 06:06, 23 January 2023

Overview

The Fusion Edge Compiler is a command line client, written in Java and can run on Windows or UNIX operating systems. Its responsibility is to compile SDMX data, structure, and metadata files for dissemination by the Fusion Edge Server. The Fusion Edge Compiler provides the following functions:

  1. To pull content from SDMX web services (example Fusion Registry web services) in order to populate a local file system of content to publish
  2. To compile content in the local file system to create a new Environment which can be consumed by the Fusion Edge Server
  3. To publish the Environment to an Amazon S3 bucket from which distributed Fusion Edge Servers can take their content, if configured to do so
  4. To print content writes the contents of an Environment to a CSV file as a report of what the compiled Environment contains

The compile function is the main function of the compiler as it is the only function which must be performed by the Fusion Edge Compiler in order to have a deployable Environment for the Fusion Edge Server. Organising SDMX files in a file system, and moving an Environment to the Edge Server can both be performed manually or via a custom automated process.

Common Arguments

The following table contains arguments which are used for more then one Compile function.

Help

Every compile function will output a list of the available function arguments by passing -h to the function.

Example

buildFileSystem.sh -h

Properties File

Every compile function can read a JSON file to get the values, this can be used instead of, or in addition to adding command line arguments.

Each command line argument has a corresponding JSON property.

Argument Example Description
prop -prop "/home/props.json" Optional. A reference to one or more properties files (separated by a space)

Example

buildFileSystem.sh -prop "MyConfig.json"

Example MyConfig.json

{
  "SdmxAPI" : "https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2",
  "TgtDir"  : "DemoServerFiles",
  "AllData" : true
}

Web Service Arguments

The web service arguments are used by buildFileSystem and the refreshContent functions.

Argument Example Description
api -api "https://stats.bis.org/api/v1" Required. The URL of the web service to pull the content from
apiv
since 4.4.0
-apiv "1.5.0" Optional. The version of the SDMX API to query, it defaults to version 2.0.0 (entry point in Fusion Registry is /sdmx/v2) the other option is 1.5.0 (Fusion Registry entry point is /ws/public/sdmxapi/rest)
apict -apict 60 API Connect Timeout (seconds)
apirt -apirt 600 API Read Timeout (seconds)
apiua -apiua EdgeServer API User-Agent
apiformat -apiformat sdmx-json Format to get structures back from API, default is fusion-json (compatible with Fusion APIs and included non-sdmx structure types) - alternatives are sdmx-json, sdmx-ml-21, sdmx-ml-3
usr -usr "myusername" Username to authenticate with the REST API, if using the Fusion Registry it should correspond to a user account in the Fusion Registry
pwd -pwd "mypassword" Password to authenticate with the REST API

Current Environment

The compileFileSystem and refreshContent functions can reference an existing Environment in order to merge new content into that Environment.

Argument Example Description
prop -prop "/home/props.json" Optional. A reference to one or more properties files (separated by a space)
tgt -tgt "/home/compiler/compiled" Required. The target directory to write the new Environment to
lgr -lgr "/home/compiler/live_environment" The location of the current Environment
sgn -sgn "my_signature" Required. A secret signature to sign the new Environment files with. This signature must match the one used to sign the current Environment.
s3rgn -s3rgn "us-east-1" Amazon S3 region – required if the Ledger is hosted on Amazon S3
s3sec -s3sec "azxzcvbnm" Amazon S3 Secret – required if the Ledger is hosted on Amazon S3
s3acc s3acc "azxzcvbnm" Amazon S3 Access Key – required if the Ledger is hosted on Amazon S3

Pull Content

buildFileSystem.sh (UNIX) or buildFileSystem.bat (Windows)

Overview

The Fusion Edge Compiler queries and SDMX web service for structural metadata, data, and reference metadata content based on what it has been requested to pull. It can work against a Fusion Registry web service as well as any other SDMX web service that complies with the SDMX specification.

The Fusion Edge Compiler pulls the content to build a target directory of files in the correct structure for the compile process to operate. The Fusion Edge Compiler command line arguments focus on which Datasets to pull from the target web service. When a dataset is pulled, the corresponding metadata (Dataflow, DSD, Concepts, Codelists) will also be pulled.

It is possible to pull only updates to datasets into an existing file system, by using the -lgr or -upd property.

Command Line Arguments

The minimum arguments are the api to pull the data from, and the target folder to write the files to, in addition instruction on which datasets or structures to pull.

Example (minimum arguments)

buildFileSystem.sh -api "https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2" -df "all" -tgt "SourceFiles"

Arguments

Additional Arguments

Argument Example Description
audit
since 4.2.0
-audit If present, will output an Audit folder with a json file containing information about the REST API request and corresponding response.
exempty
since 4.2.1
-exempty Optional. If present Dataflows from the -df parameter will be excluded from the output if they have no data pulled from the server
datastruct
since 4.2.1
-datastruct Optional. If present each Dataflow in the -df argument will be processed to include the following related structures in the output: Pre-defined Queries, Publication Tables, and Category Schemes +Categorisations
df -df "ECB:CPI(1.0)" "ECB:EXR(1.0)" Optional. A reference to one or more Dataflows to pull data for (separated by a space). The keyword all can be used to pull data for all Dataflows. A '*' can be used to mean all versions, e.g. ECB:EXR(*)
lenient -lenient If present the pull process will skip failed data queries (they will not prevent the remaining pull to complete). The failure will be noted in the report for the Dataset.
metadata
since 4.5.0
-metadata incref Issues a request for all metadata from the web service, will also include the Metadataflows and Metadata Structures which the Metadata Sets conform to. The argument incref will ensure that all target structures (the structures the metadata is authored against) and their descendants are also included in the output, excref will not include the target structures.

Note: If the target structures are not present in the file system by some other means then the reference metadata will be excluded during the Compile process.
replace -replace If present, all the files in the the target directory will be deleted before the pull content is run
report -report If present, the report will be written to a file, if not present the report will be written to the System.out
str -str "codelist=SDMX" Structures to query (in addition to dataset related metadata). Syntax [structure type]=[agencyid],[id],[version], the only required part is the structure type with other parameters defaulting to 'all'
tgt -tgt "/home/compiler/target" Required. The target directory to write the files and folders to
upd Example This applies an updatedAfter query parameter against the target web service when querying for data (only retrieve data updated after a point in time). This is an alternative to using the lgr argument which dynamically determines the updatedAfter parameter based on the time that Environment was built.


Pulling Data
When building an Edge Server for data dissemination, the compiler must be told which Dataflows to pull data for, this is achieved using the -df argument, for example:

buildFileSystem.sh -api "https://myregistry/ws/public/sdmxapi/rest" -df "all" -tgt "/home/user/EdgeServer/RawFiles" 

In this example all Dataflows will be pulled from the web service, and the data for each Dataflow will be pulled.

It may be the case that the api contains Dataflows which have no data. These Dataflows will be pulled into the Edge Sever file system, meaning they will be available via the Edge Server structure web service, even though the Edge Server has no data available (in the same way the Dataflow was pulled from the source api with no corresponding dataset). This behaviour can be changed so that the Dataflow is excluded from the structure file if it has no data. This is achieved using the -exempty argument

 buildFileSystem.sh -api "https://myregistry/ws/public/sdmxapi/rest" -df "all"  -tgt "/home/user/EdgeServer/RawFiles" -exempty

The file system will now only contain the Dataflows where there is a corresponding dataset for the Dataflow.

Whilst the -str argument can be used to list additional structures to pull from the api, in the case of data dissemination, it is likely that a core set of additional 'data related' structures are also required. These structures are the Predefined Query, Publication Table, Categorisations and related target Structure. By adding the -datastruct argument the compiler will include, for each Dataflow, these related structures.

  buildFileSystem.sh -api "https://myregistry/ws/public/sdmxapi/rest" -df "all"  -tgt "/home/user/EdgeServer/RawFiles" -exempty -datastruct

The above example will pull all Dataflows that have a corresponding dataset, and for each Dataflow it will obtain related the related structures for the Dataflow.

Properties File

It is possible to provide all the arguments to the Fusion Edge Compiler via a properties file, referenced by the -prop argument.

An example Properties file is given below:

{
  "Ledger" : 	"s3:mybucket",
  "TgtDir" :  	"/home/compiler/target",
  "SdmxAPI" : 	"https://demo.metadatatechnology.com/FusionRegistry/ws/public/sdmxapi/rest",
  "UpdatedAfter" :	"2010",
  "Username" : 	"myuser",
  "Password" : 	"pwd",
  "AllData" : 	true,
  "FullReplace" : 	true,
  "Zip" : 		true,
  "Metadata" : 	true,
  "S3Region":	"us-east-1",
  "S3SecretKey":	"azxasdasfcvbn",
  "S3AccessKey":	"sxcvbnmu",
  "SubCubes":{
     "ECB:EXR(1.0)" : {
        "SubCube1" : {
           "Include" : {
              "FREQ":["A","M"],
              "REF_AREA":["UK"]
           }
        }
     },
     "WB:POVERTY(1.0)":{ }
  },
  "Structures":{
     "Codelist": ["ECB,EXR,1.0"]
     "HierarchicalCodelist": ["ECB", "BIS"]
     "all": ["SDMX"]
     ]
  }
}


Structures
The Structures section of the properties file defines which structural metadata should be included in the outputs.

Note, when outputting data for a Dataflow, the Dataflow and all descendants (DSD, Codelist, Concept Scheme, Agency Scheme) will be automatically included in the structure metadata that is generated and do not need to be explicitly specified. This is also true for the Structure section, specifying a specific structure such as a HierarchicalCodelist, will automatically include its descendant structures.

The arguments are:

  1. The structure type, this is the same as the path parameter on the REST API, i.e. Codelist.
  2. An array of structure filters in the format AgencyId,Id,Version. Each filter is optional, the absence of which meaning all. The keyword all can be used as a structure type to indicate all structures, which can also take the filters for agency, id and version. Example ["ECB,EXR,1.0", "SDMX", "BIS,all,2.1"]

Compile Content

compileFileSystem.sh (UNIX) or compileFileSystem.bat (Windows)

Overview

The compile script reads the files in from the source directory and compiles them into the target directory. The result of the compile process is an Environment that can be published to the Fusion Edge Server.

If the target directory does not exist it will be created, if it does exist the contents will be overwritten.

Signing Content

All compiled Environments are digitally signed using the a secret key provided using the -sgn command line argument. This signature ensures that the Environment files are not tampered with after they have been generated. It also ensures that the Fusion Edge Server knows that the Environment was generated by a trusted source, as the Fusion Edge Server must also know that the secret key is in order to verify the source.

Embargo

The generated Environment contains the timestamp of Compilation, it also contains a timestamp for release, which unless other specified is the same as the timestamp of compilation. The timestamp for release can be modified, using the -liv argument, to be a point in time in the future. When this argument is provided the Environment will not be made live by the Fusion Edge Server until this time has passed. It is possible configure the Fusion Edge Server in the properties file to pre-load the Environment into memory so it is ready for release.

Static Mode Compile

When running in Static mode the Fusion Edge Server does not make use of the ledger file, it always reads in the Environment zip file regardless of what version is in the ledger.json file. In this way a static mode Compile does not need to use the -lgr argument, as long as the source file system contains the full Environment source files for dissemination.

The compile process is:

  1. The Fusion Edge Compiler reads all files from the -src directory
  2. The Fusion Edge Compiler compiles all files
  3. The Fusion Edge Compiler writes the compiled files to the -tgt directory

It is possible to make use of the lgr argument if the intention is to merge new information into a previous compile. Details of merging Environments are provided in the next section.

Dynamic Mode Compile

When running in Dynamic mode it is important to include the location of the last (current) Environment using the -lgr argument. The ledger of the new Environment will be appended to in the new Environment, creating a new version to describe the new Environment. This is critical as the Fusion Edge Server will only update its content if it detects a change in the version of the ledger file.

By default, when using the -lgr argument, the compile process will append and/or modify information in an existing Environment. Information will not be deleted (although there are ways to force a delete, covered later).

When providing a -lgr argument, the compile process is:

  1. The Fusion Edge Compiler reads the referenced Environment of the -lgr argument to determine the timestamp it was built
  2. The Fusion Edge Compiler reads files from the source file system whose timestamps are later then the build time of the Environment (to get the deltas)
  3. The Fusion Edge Compiler copies all previously built data and structure stores into the new Environment - only making modifications if it found data/metadata files whose timestamps were later then the referenced Environment compiled timestamp.

In essence, the default behaviour is to copy the previous Environment, and to merge in deltas. However, this can result in undesirable results when the intention is to delete information such as datasets, series, observations, codelists, etc. If structural changes are made to a Data Structure which makes previously compiled datasets invalid, the merge process will result in an unstable system. Therefore the following arguments are provided to enable a finer level of control over what is read from the file system and what is copied from the old Environment.

Ignore Timestamp on File

The -f argument can be provided which forces all source files in the file system to be read regardless of the timestamp.

Replace Dataset

the -rd argument (replace dataset) informs the compile process to replace the dataset in the current Environment with those found in the file system. This argument has 2 possible actions remove or keep.

The keep action informs the compiler to keep any datasets that are in the Environment and not in the File System. For example if the Environment contains a dataset for Exchange Rates, Interest Rates and Employment and local file system has a dataset for Exchange Rates, using the option -rd keep will output a new Environment with the Exchange Rates dataset from the file system, and the Interest Rates and Employment datasets from the previous Environment.

The remove action informs the compiler to remove any datasets that are in the Environment and not in the File System. For example if the Environment contains a dataset for Exchange Rates, Interest Rates and Employment and local file system has a dataset for Exchange Rates, using the option -rd remove will output a new Environment with only the Exchange Rates dataset from the file system - the other two datasets will no longer exist in the new Environment.

Note prior to version 4.4.0 the -rd parameter took no arguments, the behaviour was always remove.

Note the -f argument should still be used if the intention is to read all data files in the file system, otherwise the default behaviour will be to read only the data files that have changed since the last compile.

Full Replace

Compiles all files in the source file system, ignoring any timestamp on the referenced Environment. No information is copied from the referenced Environment, with exception of the fes_ledger.json file, which is used as a base for creating the new fes_ledger.json with an updated version and history of previous versions.

Command Line Arguments

The minimum arguments are the source folder which contains the files to compile, and the target folder to write the compiled Environment to, along with the signature on the Environment

Example (minimum arguments)

compileFileSystem.sh -src "SourceFiles" -tgt "CompiledFiles" -sgn "password"

Arguments

Additional Arguments

Argument Example Description
f -f Optional. Force all files in the source file system to be read regardless of the previous Environment compiled timestamp
fr -fr Optional. Full Replace. Read all files in the source file system. Do not merge any metadata or data stores from previous Environment.
liv -liv "2020-01-30T00:00.00" Optional. The go live time (embargo time). The generated Environment will include this timestamp so that it is not released by the Fusion Edge Server until the time. The format is yyyy-mm-ddThh:mm:ss and the timezone is GMT
ra
since 4.4.0
-ra "UNIT" -ra "OBS_CONF" Optional. Remove Attribute. This argument removes attributes from all Data Structures that contain it, and all corresponding datasets. This will NOT remove attributes from existing datasets in previously built environments - it only works on newly compiled datasets.
rd -rd keep Optional. Do not merge any data stores from previous Environment. Replace all Datasets with those read in from the file system.


Note: From v4.4.0 this argument takes an addition value of 'keep' or 'remove', keep indicates that any datasets in the current Environment but not in the file system should be kept in the new Environment, remove indicates they should be removed with only the datasets in the file system being included in the new Environment.

src -src "/home/compiler/source" Required. The source directory that contains the files to be compiled (this is the tgt directory in the build file system script)
tgt -tgt "/home/compiler/compiled" Required. The target directory to write the files and folders to

Properties File

It is possible to provide all the arguments to the Fusion Edge Compiler via a properties file, referenced by the -prop argument.

An example Properties file is given below:

{
  "Ledger" : 	     "s3:mybucket",
  "SrcDir" :  	     "/home/compiler/source",
  "TgtDir" :  	     "/home/compiler/compiled",
  "ForceRebuild" :  false,
  "Signature" :     "myuser",
  "LiveTime" :      "2020-01-30T00:00.00",
  "S3Region":	     "us-east-1",
  "S3SecretKey":    "azxasdasfcvbn",
  "S3AccessKey":    "sxcvbnmu",
  "ForceRebuild":   false,
  "ReplaceData":    "keep",
  "FullReplace":    false,
}

Refresh Content

refreshContent.sh (UNIX) or refreshContent.bat (Windows)

Note: this feature was introduced in version 4.4.0

Overview

The Refresh command updates the Environment directly, by pulling SDMX content from a web service, comparing it with what is in the Environment, and refreshing the information. Critically the refresh command will only update existing information in the Environment, for example a refresh on structures will compare the structures in the Environment against those from a web service to ensure the Environment's structures are up to date - any structure differences from the web service will be written into the Environment - structures found from the web service will NOT be written into the Environment if those structures do not already exist in the Environment.

Command Line Arguments

The minimum arguments are the Environment to refresh (along with the signature), what part(s) if of the Environment to refresh, where to built the new Environment, and the web service details which will be used as the source of information which will be written into the new Environment

Example (minimum arguments)

refreshContent.sh -api "https://demo11.metadatatechnology.com/FusionRegistry/sdmx/v2" -src "CompiledFiles" -tgt "NewCompiled" -sgn "password" -structures

Arguments

Additional Arguments

Argument Example Description
structures -structures Optional. If provided the structures in the Environment will be refreshed
remove -remove Optional. The default behaviour is to keep structures that are in the Environment, even if those structures could not be found in the web service. This argument overrides this behaviour by requesting that structures are removed from the Environment if those structures are not found in the web service. Corresponding data and metadata will also be removed if the underlying structures upon which it relies are removed.

Publish Content

publishContent.sh (UNIX) or publishContent.bat (Windows)

Overview

The Publish Content script is used to move an Environment from the local file system to Amazon S3. If the Environment is hosted elsewhere, for example a private web server, or a file system local to the Fusion Edge Server, then a custom process must be put in place to move the Environment .

The Publish Content will ensure the Amazon S3 bucket is updated to reflect the new Environment in the most efficient way, for example if it detects a file is unchanged in Environment it will not be moved. The ledger file is always moved last which ensures the Fusion Edge Server does not attempt to process the Environment until all the files are moved.

Command Line Arguments

The minimum arguments are the source folder of the current Environment, and the location of the Amazon S3 bucket to publish the Environment to, along with the required credentials

Example (minimum arguments)

publishContent.sh -src "CompiledFiles" -lgr “s3:mybucket” -s3rgn "us-east-1" -s3sec "azxzcvbnm" -s3acc "azxzcvbnm"

Arguments


Argument Example Description
src -src "/home/compiler/environment" Required. The directory that contains the Environment to be published (this is the tgt folder in the compile process)

Properties File

It is possible to provide all the arguments to the Fusion Edge Compiler via a properties file, referenced by the -prop argument.

An example Properties file is given below:

{
  "Ledger" : 	  "s3:mybucket",
  "SrcDir" :  	  "/home/compiler/compiled",
  "S3Region":	  "us-east-1",
  "S3SecretKey":  "azxasdasfcvbn",
  "S3AccessKey":  "sxcvbnmu",
}


Print Content

printContent.sh (UNIX) or printContent.bat (Windows)

Overview

The Print Content script outputs a CSV report on what is contained in an Environment.

Command Line Arguments

The minimum arguments are the Environment to print along with the signature on that Environment

Example (minimum arguments)

printContent.sh -lgr "CompiledFiles" -sgn "password"

Arguments


Argument Example Description
out -out "/home/compiler/report.csv" Optional. If provided this is the file that the report will be written to, otherwise the report will be written to the console
loc -loc "fr" Optional. If provided this is the default locale to use when printing the name of the structures - unresolvable names will be written in an alternative locale, defaulting to "en". If not provided "en" will be the default locale.