Difference between revisions of "Edge Server Publication"
(→Generating Edge Content) |
(→Edge Content Size) |
||
Line 57: | Line 57: | ||
= Edge Content Size = | = Edge Content Size = | ||
− | The size of the zip file generated depends on the balance between series and observations with more series each with few observations tending to increase the size. A good rule for estimating | + | The size of the zip file generated depends on the balance between series and observations with more series each with few observations tending to increase the size. |
+ | |||
+ | A good rule for estimating zip size is <strong>1.5MB per million observations</strong>. | ||
= Download Edge Content = | = Download Edge Content = |
Revision as of 09:18, 26 April 2021
Contents
Overview
By design, the Fusion Registry does not directly integrate with the Fusion Edge server, instead it is capable of generating content that can be published to it. The genearted content consists of a single zip file, as required by the Fusion Edge Server for ingestion.
Edge Server 'environments' can be described in the Fusion Registry, the puspose for describing an environment is to enable the Fusion Registry to know what content is should include/exclude from the published content. There is no direct or indirect relationship between the Fusion Registry and the Fusion Edge Server, so the 'environment' definition is simply to help administrators maintain a packaging definition.
The Fusion Registry User Interface supports the creation of an environment definition, and can be used to generate the publication file for the Fusion Edge Server. It is also possible to drive both processes using the Fusion Registry web services.
The process of publishing the generated content to the Fusion Edge server is beyond the remit of the Fusion Registry, and therefore is either a manual process, or an automated task using scripts which are external to, and not goverened by, the Fusion Registry
Environment Definition
The Fusion Edge Server Environment Definition consists of the following properties
Property | Purpose |
---|---|
Id | Unique Identifier. This is used when performing operations such as changing definitions, or generating a publication |
Description | Human readable description of the environment |
Include Data | whether to include datasets in the publication (or structural metadata only) (yes/no/or a specific list) |
Include Reference Metadata | true/false - whether to include reference metadata in the publication |
Include Provisions | true/false - related to the inclusion of data, whether to include Provision Agreements in the publication, or just the Dataflows |
Edge Organisation | optional. This can be set to any Data Provider,Data Consumer, or Agency in the Fusion Registry. When generating content for the Fusion Edge Server, the environment will be treated as a user, with the same security rules applied as if the linked Organisation were requesting structures or data. The default setting is anonymous user. |
Enforce Embargo | if set to true, then an Embargo date/time must be set when generating the Edge Server content |
Edge Server Content and Security
The Fusion Registry is responsible for generating Fusion Edge Server content when it recives the command to build a publication for the given environment. The default content, if data and refernce metadata is not included, is to include all structural metadata with the exception of those restricted due to security rules. Security rules may also result in a subset of lists to be exported, for example if a security rule is applied to a specific Code in a Codelist to make it restricted, then it will not be included in the published Codelist. If data is included then all datasets will be included, with the exception of those restricted due to security rules. Security rules may also result in a subset of a dataset to be included (for example if a secuirty rule is provided to make any observation with OBS_CONF=C private, then it will not be included in the published dataset). The environment definition may provide a finite list of Datasets to include, which will operate in conjuntion with security rules of the Fusion Registry. If Provision Agreements are set to be excluded from the Publication, then so to are any Content Constraints against the Provision Agreement and any Categorisations that reference a Provision Agreement.
Any content that is written to the publication is first checked against the Fusion Registry's own security rules. If the Fusion Registry server security has been set to Private, then the Environment should be assigned to a Organisation. If the Organisation is a Data Consumer, then the publication will only contain datasets that the Data Consumer has been explicitly granted access to via the Data Consumer Access rules of the Fusion Registry.
Any security rules defined in the Content Security pages of the Fusion Registry will be enfoced also, for example is a Dataflow is linked to a security group of SECURE, and the Environment's linked Organisation is not a member of that group, then the Dataflow and related dataset will not be included in the publication.
Generating Edge Content
The generation of Edge content requires the following information
Property | Purpose |
---|---|
Publication Type |
|
Embargo | Optional. If an embargo time is supplied, this information is put into the zip's file name as a timestamp. The Fusion Edge Server uses the file name to determine if there is an embargo time, and will not publish the environment until the embargo time has passed |
The generation of Edge conent is a request which is processed asyncronously, and as such a status report is maintained by the Fusion Registry so that the progress can be tracked. Once the edge content is ready, it can be downloaded via a separate call.
Edge Content Size
The size of the zip file generated depends on the balance between series and observations with more series each with few observations tending to increase the size.
A good rule for estimating zip size is 1.5MB per million observations.
Download Edge Content
Once the Fusion Registry has generated a Fusion Edge Server publication, it can be downloaded. The download is available up until such time that the Fusion Registry is asked to generate a new Fusion Edge Server publication for the same environment.
Web Service
Creating and Configuring Environments
Add Environment
Entry Point | /ws/secure/edge/addEnvironment |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EdgeEnvironmentId", "Description" : "Environment Description" }
The Edge Server Environment will be saved, and further web services can be used to configure what is included in the publication file for the Fusion Edge Server
Edit Environment
Entry Point | /ws/secure/edge/editEnvironment |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "NewEnvironmentId", "OldId" : "OldEnvironmentId" "Description" : "Environment Description" }
The Edit server allows the Id to be changed, this is optional and is achieved by specifying an OldId which is different to the Id in the POST request. The description can also be changed.
Configure Include Data
Entry Point | /ws/secure/edge/setIncludeData |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "IncludeData" : true|false }
If true then the Edge Server publication will include datasets. By default the publication will include all datasets that it is has access to, based on security rules and the Environment's linked organisation type. If the Environment is not linked to a Organisation, then the Environment will be treated as a anonymous (unauthenticated) user, and only datasets with public access will be included in the publication.
The setIncludedDataflows web service can be used to define an explicit set of datasets that should be included in the publication, with any unspeciufied datasets being implicitly excluded.
If false, then the Edge Server publicaiton will not include any datasets
Configure Included Datasets
Entry Point | /ws/secure/edge/setIncludedDataflows |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "IncludeDataflows" : ["urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=ESTAT:HC01(2.0)", " urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=WB:WDI_HEALTH(1.0)"] }
The array of URNs are the URNs of each Dataflow to include data for.
Configure Include Reference Metadata
Entry Point | /ws/secure/edge/setIncludeMetadata |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "IncludeMetadata" : true|false }
If true, then the publication will include any reference metadata that the environment is authorized to see based on the security rules defined in the Fusion Registry.
Configure Include Provision Agreements
Entry Point | /ws/secure/edge/setIncludeProvisions |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "IncludeProvisions" : true|false }
If true, the publication will store each dataset against the Provision Agreement, and the Provision Agreement structural metadata will be included in the publication. If a dataset has many data providers, then a single dataset will be split across multiple provision agreements. The advantage of this is that the data provider information is preserved in the Edge Server, and the user can query for data by data provider using the standard SDMX data query (data provider is the last path parameter).
If false, the publication will not include Provision Agreements in the structural metadata. Datasets from multiple data providers will be combined into a single table, and not broken down by data provider. This may result in slightly faster performance times, as the Fusion Edge server will not have to combine dataset fragments from multiple providers on data query
Configure Environment's Organisation
Entry Point | /ws/secure/edge/setOrganisation |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "OrgansiationUrn" : "urn" }
The Organisation URN should link to either an SDMX Agency, Data Provider, or Data Consumer that exists in the Fusion Registry. The Environment will inherit the permissions of that Orgnanisation when an edge server publication is generated
The OrgansiationUrn property can be ommmitted in order to strip the Environment of having an Organisation. In this case, the Environment will assume the same permissions as an anonymous, public user.
Configure Enforce Embargo
Entry Point | /ws/secure/edge/setEmbargo |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EnvironmentId", "Embargo" : true|false }
If set to true, then the Registry will enforce that an embargo time is provided when a build publication request is recieved
Generating Edge Content
Entry Point | /ws/secure/edge/buildEdgeServerPublication |
Access | Private (admin only) |
Http Method | POST |
Content-Type | application/json |
Response Format | application/json |
Response Statuses | 200 - Request accepted 400 - Posted content invalid 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The POST JSON should look as follows:
{ "Id" : "EdgeEnvironmentId", "PublicationType" : "FullReplace", "EmbargoDate" : 1581353225818, }
PublicationType of Replace is used to indicate that it is a delta
The Embargo Date is optional, and measured in Epoc Timestamp milliseconds (number of milliseconds since 1970)
Generating Edge Content - Status
Entry Point | /environment/status/{environmentID} |
Access | Private (admin only) |
Http Method | GET |
Response Format | application/json |
Response Statuses | 200 - Request accepted 401 - Unauthorized (if access has been restricted) 500 - Server Error |
The reponse looks as follows:
{ "id" : 1 "environmentId" : "PROD" "processStartDate" : 1581353225818 "processEndDate" : 1581353233333 "status" : "PROCESSING"|"ERRORED"|"SUCCESSFUL"|"CANCELLED"|"WRITING_DATA"|"WRITING_STRUCTURES" "auditTxId" : "cd82166a-fe31-4870-8307-c0f8d00d3eb3", "totalDatasets" : 10, "processedDatasets" : 8 "error" : "" }
The following table describes the JSON properties in the response message
Property | Description |
---|---|
id | Unique identifier for this status |
environmentId | The Id of the environment that the status is for |
processStartDate | The end date of the Edge Server zip generation process, in epoch time milliseconds |
processEndDate | The end date of the Edge Server zip generation process, in epoch time milliseconds |
status | The current status of the Edge Server zip generation |
auditTxId | The audit transaction id |
totalDatasets | Total number of datasets that will be included in the output |
processedDatasets | Total number of datasets that have been written to the output so far |
error | Optional. Provided if the status is ERRORED. |
Example |
Downloading Edge Content
Entry Point | /environment/export/{environmentID} |
Access | Private (admin only) |
Http Method | GET |
Response Format | application/octet-stream |
Response Statuses | 200 - Request accepted 401 - Unauthorized (if access has been restricted) 500 - Server Error |
A call to this web service results in a file download which can be published to the Fusion Edge Server. The name of the file is important as it is used to tell the Fusion Edge Server what the contents are and how to treat the information. The following table describes the various names the file can take
File Name | Description |
---|---|
node.zip | Full Replace |
node_1581353225818.zip | Full Replace with an embargo time of 1581353225818 milliseconds since 1970. This timestamp relates to 10-Feb-2020, 16:47:05 GMT |
delta.zip | Contains only datasets to be merged with existing datasets in the Fusion Edge Server |
delta_1581353225818.zip.zip | Contains only datasets to be merged with existing datasets in the Fusion Edge Server - with an embargo time provided of 1581353225818 milliseconds since 1970. |