Fusion ETL Server Schedule

From Metadata Technology Wiki
Jump to navigation Jump to search

Overview

The ETL Server requires at least one schedule configuration as the start point of all ETL rules is the schedule. The schedules are created manually in JSON format in the schedules directory of the ETL Server Directory.

A Schedule can refer to one or more multiple Dataflows and provides a CRON expression to define which period(s) to fire. A Schedule may use one or more Views to define a subcube of data to pull, the View provides a list of Dimension filters to apply to the data extract process. If multiple views are used by a Schedule all the filters are combined, resulting in one consolidated filter to be run the extract. If a View refers to any Dimensions that do not exist in the Data Structure Definition used by Dataflow being processed, the Dimenison filter will be ignored for that Dataflow.

A Schedule may also define the data extract as being Full, if an extract is not full then the delta is taken from the last successful publication.

Schedule files are read on Tomcat server startup, and can be refreshed by re-starting the service.

Schedule File Contents

The schedule file is in JSON format. Each schedule is maintained in its own file, the name of the file does not matter. The files must be created under the schedules folder of the ETL Server Directory.

  • Id is a local identifier, used in the User Interface and Audit events
  • Cron the cron job definition, there are free formatters on the web to help build expressions
  • Full if true then each run will pull data from the database with no filters on time. If false, each run will only query for observations which have been updated since the last successful publication.
  • Dataflows an array of Dataflows which will be have data extracted and published on each run
{
   "Id": "HEALTH_DELTA",
   "Cron": "55 0 * ? * * *",
   "Full": false,
   "Views": ["DZA", "ANNUAL"],
   "Dataflows": [
      {
         "Agency": "WB",
         "Id": "WDI_HEALTH",
         "Version": "1.0"
      }
   ]
}

Note: in order to support Full=false the Dataflow Mapping must have mapped a database column for FR_LAST_UPDATED_OBS.

Manual Run

It is possible to run a schedule manually through the user interface. To do this navigate to the Schedule definition, and click on Manual Run.

Schedule.png