Edge Server Publication

From Fusion Registry Wiki
Revision as of 00:56, 29 September 2022 by Mnelson (talk | contribs) (Note)
Jump to navigation Jump to search


Note

Fusion Registry v10 only. Fusion Registry 11 requires the updated Fusion Edge Server to obtain and compile the Environment

Overview

By design, the Fusion Registry does not directly integrate with the Fusion Edge server, instead it is capable of generating content that can be published to it. The genearted content consists of a single zip file, as required by the Fusion Edge Server for ingestion.

Edge Server 'environments' can be described in the Fusion Registry, the puspose for describing an environment is to enable the Fusion Registry to know what content is should include/exclude from the published content. There is no direct or indirect relationship between the Fusion Registry and the Fusion Edge Server, so the 'environment' definition is simply to help administrators maintain a packaging definition.

The Fusion Registry User Interface supports the creation of an environment definition, and can be used to generate the publication file for the Fusion Edge Server. It is also possible to drive both processes using the Fusion Registry web services.

The process of publishing the generated content to the Fusion Edge server is beyond the remit of the Fusion Registry, and therefore is either a manual process, or an automated task using scripts which are external to, and not goverened by, the Fusion Registry

Environment Definition

The Fusion Edge Server Environment Definition consists of the following properties

Property Purpose
Id Unique Identifier. This is used when performing operations such as changing definitions, or generating a publication
Description Human readable description of the environment
Include Data whether to include datasets in the publication (or structural metadata only) (yes/no/or a specific list)
Include Reference Metadata true/false - whether to include reference metadata in the publication
Include Provisions true/false - related to the inclusion of data, whether to include Provision Agreements in the publication, or just the Dataflows
Edge Organisation optional. This can be set to any Data Provider,Data Consumer, or Agency in the Fusion Registry. When generating content for the Fusion Edge Server, the environment will be treated as a user, with the same security rules applied as if the linked Organisation were requesting structures or data. The default setting is anonymous user.
Enforce Embargo if set to true, then an Embargo date/time must be set when generating the Edge Server content

Edge Server Content and Security

The Fusion Registry is responsible for generating Fusion Edge Server content when it recives the command to build a publication for the given environment. The default content, if data and refernce metadata is not included, is to include all structural metadata with the exception of those restricted due to security rules. Security rules may also result in a subset of lists to be exported, for example if a security rule is applied to a specific Code in a Codelist to make it restricted, then it will not be included in the published Codelist. If data is included then all datasets will be included, with the exception of those restricted due to security rules. Security rules may also result in a subset of a dataset to be included (for example if a secuirty rule is provided to make any observation with OBS_CONF=C private, then it will not be included in the published dataset). The environment definition may provide a finite list of Datasets to include, which will operate in conjuntion with security rules of the Fusion Registry. If Provision Agreements are set to be excluded from the Publication, then so to are any Content Constraints against the Provision Agreement and any Categorisations that reference a Provision Agreement.

Any content that is written to the publication is first checked against the Fusion Registry's own security rules. If the Fusion Registry server security has been set to Private, then the Environment should be assigned to a Organisation. If the Organisation is a Data Consumer, then the publication will only contain datasets that the Data Consumer has been explicitly granted access to via the Data Consumer Access rules of the Fusion Registry.

Any security rules defined in the Content Security pages of the Fusion Registry will be enfoced also, for example is a Dataflow is linked to a security group of SECURE, and the Environment's linked Organisation is not a member of that group, then the Dataflow and related dataset will not be included in the publication.

Generating Edge Content

The generation of Edge content requires the following information

Property Purpose
Publication Type
  • Full Replace - this is the default behaviour. The generated zip file contains the full contents (data/structures/reference metadata) for the Fusion Edge Server. When loading this zip to the Fusion Edge Server all existing content will be replaced.
  • Delta - only relevant for datasets. The generated zip file contains only datasets that have changed since the last publication. Only the data that has been changed will be included. When loading the file to the Fusion Edge Server, all existing content will be preserved, and the new data will be merged into the existing datasets
Embargo Optional. If an embargo time is supplied, this information is put into the zip's file name as a timestamp. The Fusion Edge Server uses the file name to determine if there is an embargo time, and will not publish the environment until the embargo time has passed

The generation of Edge conent is a request which is processed asyncronously, and as such a status report is maintained by the Fusion Registry so that the progress can be tracked. Once the edge content is ready, it can be downloaded via a separate call.

Edge Content Size

The size of the zip file generated depends on the balance between series and observations with more series each with few observations tending to increase the size.

A good rule for estimating zip size is 1.5MB per million observations.

Download Edge Content

Once the Fusion Registry has generated a Fusion Edge Server publication, it can be downloaded. The download is available up until such time that the Fusion Registry is asked to generate a new Fusion Edge Server publication for the same environment.


Web Service

Creating and Configuring Environments

Add Environment

Entry Point /ws/secure/edge/addEnvironment
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error


The POST JSON should look as follows:

{
  "Id" : "EdgeEnvironmentId",
  "Description" : "Environment Description"
}

The Edge Server Environment will be saved, and further web services can be used to configure what is included in the publication file for the Fusion Edge Server

Edit Environment

Entry Point /ws/secure/edge/editEnvironment
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error


The POST JSON should look as follows:

{
  "Id" : "NewEnvironmentId",
  "OldId"   : "OldEnvironmentId"
  "Description" : "Environment Description"
}

The Edit server allows the Id to be changed, this is optional and is achieved by specifying an OldId which is different to the Id in the POST request. The description can also be changed.

Configure Include Data

Entry Point /ws/secure/edge/setIncludeData
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "IncludeData"   : true|false
}

If true then the Edge Server publication will include datasets. By default the publication will include all datasets that it is has access to, based on security rules and the Environment's linked organisation type. If the Environment is not linked to a Organisation, then the Environment will be treated as a anonymous (unauthenticated) user, and only datasets with public access will be included in the publication.

The setIncludedDataflows web service can be used to define an explicit set of datasets that should be included in the publication, with any unspeciufied datasets being implicitly excluded.

If false, then the Edge Server publicaiton will not include any datasets

Configure Included Datasets

Entry Point /ws/secure/edge/setIncludedDataflows
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "IncludeDataflows"   : ["urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=ESTAT:HC01(2.0)", "	urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:WDI_HEALTH(1.0)"]
}

The array of URNs are the URNs of each Dataflow to include data for.


Configure Include Reference Metadata

Entry Point /ws/secure/edge/setIncludeMetadata
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "IncludeMetadata"   : true|false
}

If true, then the publication will include any reference metadata that the environment is authorized to see based on the security rules defined in the Fusion Registry.

Configure Include Provision Agreements

Entry Point /ws/secure/edge/setIncludeProvisions
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "IncludeProvisions"   : true|false
}

If true, the publication will store each dataset against the Provision Agreement, and the Provision Agreement structural metadata will be included in the publication. If a dataset has many data providers, then a single dataset will be split across multiple provision agreements. The advantage of this is that the data provider information is preserved in the Edge Server, and the user can query for data by data provider using the standard SDMX data query (data provider is the last path parameter).

If false, the publication will not include Provision Agreements in the structural metadata. Datasets from multiple data providers will be combined into a single table, and not broken down by data provider. This may result in slightly faster performance times, as the Fusion Edge server will not have to combine dataset fragments from multiple providers on data query

Configure Environment's Organisation

Entry Point /ws/secure/edge/setOrganisation
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "OrgansiationUrn"   : "urn"
}

The Organisation URN should link to either an SDMX Agency, Data Provider, or Data Consumer that exists in the Fusion Registry. The Environment will inherit the permissions of that Orgnanisation when an edge server publication is generated

The OrgansiationUrn property can be ommmitted in order to strip the Environment of having an Organisation. In this case, the Environment will assume the same permissions as an anonymous, public user.

Configure Enforce Embargo

Entry Point /ws/secure/edge/setEmbargo
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EnvironmentId",
  "Embargo"   : true|false
}

If set to true, then the Registry will enforce that an embargo time is provided when a build publication request is recieved

Generating Edge Content

Entry Point /ws/secure/edge/buildEdgeServerPublication
Access Private (admin only)
Http Method POST
Content-Type application/json
Response Format application/json
Response Statuses

200 - Request accepted

400 - Posted content invalid

401 - Unauthorized (if access has been restricted)

500 - Server Error

The POST JSON should look as follows:

{
  "Id" : "EdgeEnvironmentId",
  "PublicationType" : "FullReplace",  
  "EmbargoDate" : 1581353225818,
}

PublicationType of Replace is used to indicate that it is a delta

The Embargo Date is optional, and measured in Epoc Timestamp milliseconds (number of milliseconds since 1970)


Generating Edge Content - Status

Entry Point /environment/status/{environmentID}
Access Private (admin only)
Http Method GET
Response Format application/json
Response Statuses

200 - Request accepted

401 - Unauthorized (if access has been restricted)

500 - Server Error

The reponse looks as follows:

{
 "id" : 1
 "environmentId" : "PROD"
 "processStartDate" : 1581353225818
 "processEndDate" : 1581353233333
 "status" : "PROCESSING"|"ERRORED"|"SUCCESSFUL"|"CANCELLED"|"WRITING_DATA"|"WRITING_STRUCTURES"
 "auditTxId" : "cd82166a-fe31-4870-8307-c0f8d00d3eb3",
 "totalDatasets" : 10,
 "processedDatasets" : 8
 "error" : ""
}


The following table describes the JSON properties in the response message

Property Description
id Unique identifier for this status
environmentId The Id of the environment that the status is for
processStartDate The end date of the Edge Server zip generation process, in epoch time milliseconds
processEndDate The end date of the Edge Server zip generation process, in epoch time milliseconds
status The current status of the Edge Server zip generation
auditTxId The audit transaction id
totalDatasets Total number of datasets that will be included in the output
processedDatasets Total number of datasets that have been written to the output so far
error Optional. Provided if the status is ERRORED.
Example

Downloading Edge Content

Entry Point /environment/export/{environmentID}
Access Private (admin only)
Http Method GET
Response Format application/octet-stream
Response Statuses

200 - Request accepted

401 - Unauthorized (if access has been restricted)

500 - Server Error

A call to this web service results in a file download which can be published to the Fusion Edge Server. The name of the file is important as it is used to tell the Fusion Edge Server what the contents are and how to treat the information. The following table describes the various names the file can take

File Name Description
node.zip Full Replace
node_1581353225818.zip Full Replace with an embargo time of 1581353225818 milliseconds since 1970. This timestamp relates to 10-Feb-2020, 16:47:05 GMT
delta.zip Contains only datasets to be merged with existing datasets in the Fusion Edge Server
delta_1581353225818.zip.zip Contains only datasets to be merged with existing datasets in the Fusion Edge Server - with an embargo time provided of 1581353225818 milliseconds since 1970.