Difference between revisions of "Caching"

From Fusion Registry Wiki
Jump to navigation Jump to search
(Fusion Cache and Locale)
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category:Functions]]
 +
[[Category:How_To]]
 +
[[Category:Fusion Registry Install]]
 
= Overview =  
 
= Overview =  
 
<p>The Fusion Registry provides a number of caching solutions to help ensure the performance of both server and client side solutions . The various cache layers include:</p>
 
<p>The Fusion Registry provides a number of caching solutions to help ensure the performance of both server and client side solutions . The various cache layers include:</p>
 
* Caching layer before the Registry via a reverse proxy  (Varnish Cache)
 
* Caching layer before the Registry via a reverse proxy  (Varnish Cache)
* Caching on the Registry web service (If-Not-Modified,  
+
* Caching on the Registry web service (If-Not-Modified),  
 
* Caching on server for data responses (pre-cached datasets)
 
* Caching on server for data responses (pre-cached datasets)
 
* Caching with SDMX Queries (updatedAfter parameter)
 
* Caching with SDMX Queries (updatedAfter parameter)
  
= Varnish Cache =
+
= Cache Purge General =
<p>[https://varnish-cache.org/ Varnish] is an HTTP accelerator allowing for caching of HTTP requests.  The Varnish server acts as a [https://en.wikipedia.org/wiki/Reverse_proxy reverse proxy] accepting a clients HTTP request and then passing it onto the target server (Fusion Registry).  If Varnish has precached a response, then the response to the client will be server from the Varnish cache, and the request will not be passed onto the Fusion Registry.</p>
+
<p>@see [[Cache_Management_Web_Service#Purge_Cache|Purge Cache Web Service]]</p>
 
 
<p>Varnish Cache can be used for both Data and Structural Metadata queries via the REST API.  Varnish can be used to cache other web services or HTML pages of the Fusion Registry, however the Fusion Registry will not automatically send requests to Varnish to purge these caches, and therefore it must be managed via another process</p>
 
 
 
== Enabling Varnish Caching ==
 
The Varnish Cache server is a 3rd party software solution, and must be configured by folloing the vendor's documentation.  The Fusion Registry 'integrates' with Varnish by knowing:
 
* That Varnish is being used as a front end caching solution
 
* The URL of the Varnish server, so that the Fusion Registry can tell varnish when and which parts of it's cache to purge
 
 
 
<p>Varnish should be configured to cache any requests to the data or structure web service, and preserve any Accept-Language HTTP Headers.  An [[Varnish_Configuration|example configuration]] is provided based on Varnish Cache 4.1.2</p>
 
 
 
<p>The Fusion Registry is told of Varnish through the Settings -> Cache page.  The Fusion Registry only purges cache requests for Data and Structural Metadata queries, and as such Varnish should only be used to cache these requests, unless there is provision for another purge solution in place.</p>
 
 
 
== Varnish Cache and Security ==
 
f the Fusion Registry has security rules on specific datasets, Varnish Cache will be unaware of this.  Therefore we do not reccomed Varnish Cache as a solution for a Fusion Registry which enforces different data access levels.
 
 
 
== Varnish Cache and Locale ==
 
The Fusion Registry adds a [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary VARY Http Header] to all responses to indicate that if the client's Accept-Language changes from the cached version, then the cache should not be used.  This is relevant for all structure requests, and some datasets which include code names.
 
 
 
== Varnish Cache and Data Format ==
 
The Fusion Registry adds a [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary VARY Http Header] to all responses to indicate that if the client's Accept Header changes from the cached version, then the cache should not be used.
 
 
 
== Purging the Cache ==
 
<p>The Fusion Registry is responsible for purging the varnish cache when structural metadata or data changes. A BAN request is sent to the varnish server URL. Varnish will consume this request and remove the appropriate values from its cache.</p>
 
 
 
<p>When Fusion Registry structures are changed, a BAN request is sent to Varnish to BAN all previously cached structure queries, regardless of the structure.  When FUsion Registry data are changed, a BAN request is sent to only the data queries for the Dataflow whose data changed</p>
 
 
 
=== Example ===
 
# Request for data <i>https://yoursite.org/sdmx/data/ECB,EXR/A.UK+FR.../</i>
 
# New data is loaded or Registered for EXR
 
# Fusion Registry send BAN requests to:
 
* https://yoursite.org/sdmx/data/ECB,EXR
 
* https://yoursite.org/sdmx/data/EXR
 
 
 
<p><b>Note:</b> For Fusion Registry to know the URL of your public server, please configure the Registry Server Settings -> Reverse Proxy Mapping.</p>
 
  
 
= HTTP Cache Headers =
 
= HTTP Cache Headers =
Line 50: Line 19:
  
 
== If-Modified-Since ==
 
== If-Modified-Since ==
The Fusion Registry maintains a record of all the timestamps a structure has changed.  This information is persisted to the database, so on application startup the Fusion Registry is able to lookup the timestamps of when structures were last modified.  When the HTTP Header If-Modified-Since is used on a structure query, the Fusion Registry processses the query to determine which structures make up the response. If any of the structures in the query response have been modified since the time passed in the If-Modified-Since header, then the user will get the full query response.  If none of the structures have been modified, then a HTTP 304 (not mofified) response is sent back to the client.
+
The Fusion Registry maintains a record of all the timestamps a structure has changed.  This information is persisted to the database, so on application start-up the Fusion Registry is able to lookup the timestamps of when structures were last modified.  When the HTTP Header If-Modified-Since is used on a structure query, the Fusion Registry processes the query to determine which structures make up the response. If any of the structures in the query response have been modified since the time passed in the If-Modified-Since header, then the user will get the full query response.  If none of the structures have been modified, then a HTTP 304 (not modified) response is sent back to the client.
  
 
== If-None-Match ==
 
== If-None-Match ==
The If-None-Match request makes use of a hashing function.  When the Fusion Registry responds to a structure query, it will add a HTTP Header called <strong>ETag</strong>.  The value of the ETag header is a hash which represents the content of the response.  When the client requests the same resource, it can use the If-None-Match HTTP header, which uses the same hash value from the ETag of the previous request.  The Fusion Registry will process the structure query request, hash the response, and then check it against the hash passed inthe If-None-Match HTTP Header.  If both client and server hashes match, then a HTTP 304 response is sent, otherwise the full query response is sent back to the client with a HTTP 200 status and a new ETag hash to respresent the state of the response.
+
The If-None-Match request makes use of a hashing function.  When the Fusion Registry responds to a structure query, it will add a HTTP Header called <strong>ETag</strong>.  The value of the ETag header is a hash which represents the content of the response.  When the client requests the same resource, it can use the If-None-Match HTTP header, which uses the same hash value from the ETag of the previous request.  The Fusion Registry will process the structure query request, hash the response, and then check it against the hash passed in the If-None-Match HTTP Header.  If both client and server hashes match, then a HTTP 304 response is sent, otherwise the full query response is sent back to the client with a HTTP 200 status and a new ETag hash to represent the state of the response.
  
= Fusion Registry Cache =
+
= Fusion Cache =
<p>The Fusion Registry provides a caching solution for data queries only.  This cache is called the <b>Fusion Cache</b>.  The Fusion Cache and Varnish cache are mutually exclusive caching solutions.  If varnish Cache is being used, there is no need for the Fusion Cache.</p>
+
<p>The Fusion Registry provides a caching solution supporting a subset of HTTP request, in particular data queries and data availability queries.  The Fusion Cache uses the local file system to cache information, and it manages the purge of information periodically based on a pre-defined cache size.</p>
  
<p>The Fusion Cache is a file system cache, where the configuration of the cache tells the Registry which folder to use, and how much space it is entitled to useThe Fusion Registry will cache the response to all data queries in this cacheThe cache contains the response dataset in gzip format, in the format that the user requested.  If the user queries for data in SDMX-ML Generic format, then the cached dataset will be in this format.  If a new request comes in for the same dataset in a different format, then the Fusion Registry will convert the pre-cached data into the format requested, and then cache the response. If the HTTP request Accepts gzip response, then the Fusion Cache provides a very fast caching strategy for data queries.  If the HTTP request does not Accept gzip, then the cache is still fast, but must unzip the cached dataset before writing it to the client</p>
+
<p>The Fusion Cache and Varnish cache are mutually exclusive caching solutions.  If varnish Cache is being used, there is no need for the Fusion Cache. </p>
 +
 
 +
== Where on the file system is Information Cached? ==
 +
<p>The Fusion Cache uses the local file system cache, it uses the Registry temporary directory as a base folder, and creates sub-folders under this directory for each category of cached informationAn example sub directory is '''FusionDataCache''' which is a category for storing the response to data queries.</p>  
 +
 
 +
== How is Information Cached? ==
 +
Information is cached in '''gzip''' format, this makes it much faster to respond to HTTP queries with the HTTP Accept-Encoding Header set to gzip, most web browsers will provide this header by default, however if opening up REST APIs for public consumption, we recommend the web server enforces clients set this header if you are looking to get fast response times for large concurrent queries.
 +
 
 +
== Managing the herd ==
 +
In Fusion Registry v10.6.12 and higher the Fusion Cache is used to manage a large number of concurrent requests for the same resource, by only executing a single query and then responding to the others from the cached information. 
  
 
== Enabling the Cache ==
 
== Enabling the Cache ==
Line 64: Line 42:
  
 
== Purging the Cache ==
 
== Purging the Cache ==
The Fusion Cache is purged for a dataset, when a new dataset is loaded, or new data is registered with the Fusion Registry.
+
The Fusion Cache is automatically deleted on Fusion Registry start-up, so it is not idea as a shared caching solution.  A cache limit is also set, which is configurable, and when the cache reaches 90% of its limit, the Fusion Cache will start purging files.  The order in which the files are purged is based on which file was requested last, meaning the last accessed cached file will be the first to be deleted.  Files from the cache are purged in short bursts until the required space is recovered, this is to ensure CPU and IO does not become a bottleneck due to a long running purge process.
 +
 
 +
== Stale Cache ==
 +
The Fusion Cache will never contain stale information, as the system is designed to automatically purge parts of the cache as soon as the information in the database is updated.  For example, if a dataset is updated, the data cache will be purged automatically
 +
 
 +
== Fusion Cache and Security ==
 +
The Fusion Cache can be used in combination with security rules as the cache key takes into account current user and their access rights.
  
 
== Fusion Cache and Locale ==
 
== Fusion Cache and Locale ==
 
Some datasets can change depending on the client's Locale settings.  The Fusion Cache takes the Locale request into account, each cached item is stored against the data query, the response format, and client locale.
 
Some datasets can change depending on the client's Locale settings.  The Fusion Cache takes the Locale request into account, each cached item is stored against the data query, the response format, and client locale.
  
== Fusion Cache and Data Format ==
+
= SDMX Data Query Updated After Request =
The Fusion Cache is data format specific.  If the user requests a the same dataset for a different format (using either a query parameter, or the Accept Header) then the Fusion Cache will be used to get the dataset, and the dataset will be converted to the requested format.
+
<p>The SDMX data query supports an updatedAfter query parameter, which tells the server to only return data that matches the query, and has been added or updated after the given time period.  The time can be provided in any valid [[SDMX_Time_Formats|SDMX Time Format]], the time will always be taken as the start of period in the GMT time zone, i.e updateAfter=2009 will resolve to 2009-01-01T00:00:00GMT</p>
  
== Fusion Cache and Security ==
+
<p>How this request is handled depends on the data store that contains the data.</p>
If the Fusion Registry has security rules on specific datasets, then the Fusion Registry re-writes the data query, and therefore once it hits the cache, it will only hit on a dataset that the user has permission to see.  This strategy enables cached data to be reused by users with different security privileges.
+
<p> For <b>[[Data_Stores#Registry_Managed_Data_Store|Registry Managed Data Stores]]</b>, which includes the <b>[[Data_Stores#Fusion_Data_Store|Fusion Store]]</b> and <b>[[Data_Stores#Registry_Managed_Database|Registry managed Databases]]</b>, the Registry records when each observation is updated, and when each series is updated.  The updatedAfter is then used to remove series from the response if they have not been updated after the specified period (true for both series only and series with observation queries), if observations are in the response, these too will be removed if they have not been updated after the specified time.</p>
 +
<p>For <b>Externally managed SDMX web services</b> which are registered, the query parameter will be passed onto the service, and it is up to the service provider to ensure the query parameters are honoured</p>
 +
<p>For <b>[[Data_Stores#Externally_Managed_Database|Externally managed Databases]]</b> if there is a column for last updated date, then this will be used in the data query, otherwise the updatedAfter parameter will be ignored</p>

Latest revision as of 04:43, 13 September 2023

Overview

The Fusion Registry provides a number of caching solutions to help ensure the performance of both server and client side solutions . The various cache layers include:

  • Caching layer before the Registry via a reverse proxy (Varnish Cache)
  • Caching on the Registry web service (If-Not-Modified),
  • Caching on server for data responses (pre-cached datasets)
  • Caching with SDMX Queries (updatedAfter parameter)

Cache Purge General

@see Purge Cache Web Service

HTTP Cache Headers

HTTP Cache Headers of If-Modified-Since and If-None-Match are supported for Structural Metadata queries (NOT Data queries).

Enabling HTTP Caching

The Fusion Registry cache settings must be set to enable HTTP 304 Header.

If-Modified-Since

The Fusion Registry maintains a record of all the timestamps a structure has changed. This information is persisted to the database, so on application start-up the Fusion Registry is able to lookup the timestamps of when structures were last modified. When the HTTP Header If-Modified-Since is used on a structure query, the Fusion Registry processes the query to determine which structures make up the response. If any of the structures in the query response have been modified since the time passed in the If-Modified-Since header, then the user will get the full query response. If none of the structures have been modified, then a HTTP 304 (not modified) response is sent back to the client.

If-None-Match

The If-None-Match request makes use of a hashing function. When the Fusion Registry responds to a structure query, it will add a HTTP Header called ETag. The value of the ETag header is a hash which represents the content of the response. When the client requests the same resource, it can use the If-None-Match HTTP header, which uses the same hash value from the ETag of the previous request. The Fusion Registry will process the structure query request, hash the response, and then check it against the hash passed in the If-None-Match HTTP Header. If both client and server hashes match, then a HTTP 304 response is sent, otherwise the full query response is sent back to the client with a HTTP 200 status and a new ETag hash to represent the state of the response.

Fusion Cache

The Fusion Registry provides a caching solution supporting a subset of HTTP request, in particular data queries and data availability queries. The Fusion Cache uses the local file system to cache information, and it manages the purge of information periodically based on a pre-defined cache size.

The Fusion Cache and Varnish cache are mutually exclusive caching solutions. If varnish Cache is being used, there is no need for the Fusion Cache.

Where on the file system is Information Cached?

The Fusion Cache uses the local file system cache, it uses the Registry temporary directory as a base folder, and creates sub-folders under this directory for each category of cached information. An example sub directory is FusionDataCache which is a category for storing the response to data queries.

How is Information Cached?

Information is cached in gzip format, this makes it much faster to respond to HTTP queries with the HTTP Accept-Encoding Header set to gzip, most web browsers will provide this header by default, however if opening up REST APIs for public consumption, we recommend the web server enforces clients set this header if you are looking to get fast response times for large concurrent queries.

Managing the herd

In Fusion Registry v10.6.12 and higher the Fusion Cache is used to manage a large number of concurrent requests for the same resource, by only executing a single query and then responding to the others from the cached information.

Enabling the Cache

The Fusion Cache is enabled in the Settings -> Cache page.

Purging the Cache

The Fusion Cache is automatically deleted on Fusion Registry start-up, so it is not idea as a shared caching solution. A cache limit is also set, which is configurable, and when the cache reaches 90% of its limit, the Fusion Cache will start purging files. The order in which the files are purged is based on which file was requested last, meaning the last accessed cached file will be the first to be deleted. Files from the cache are purged in short bursts until the required space is recovered, this is to ensure CPU and IO does not become a bottleneck due to a long running purge process.

Stale Cache

The Fusion Cache will never contain stale information, as the system is designed to automatically purge parts of the cache as soon as the information in the database is updated. For example, if a dataset is updated, the data cache will be purged automatically

Fusion Cache and Security

The Fusion Cache can be used in combination with security rules as the cache key takes into account current user and their access rights.

Fusion Cache and Locale

Some datasets can change depending on the client's Locale settings. The Fusion Cache takes the Locale request into account, each cached item is stored against the data query, the response format, and client locale.

SDMX Data Query Updated After Request

The SDMX data query supports an updatedAfter query parameter, which tells the server to only return data that matches the query, and has been added or updated after the given time period. The time can be provided in any valid SDMX Time Format, the time will always be taken as the start of period in the GMT time zone, i.e updateAfter=2009 will resolve to 2009-01-01T00:00:00GMT

How this request is handled depends on the data store that contains the data.

For Registry Managed Data Stores, which includes the Fusion Store and Registry managed Databases, the Registry records when each observation is updated, and when each series is updated. The updatedAfter is then used to remove series from the response if they have not been updated after the specified period (true for both series only and series with observation queries), if observations are in the response, these too will be removed if they have not been updated after the specified time.

For Externally managed SDMX web services which are registered, the query parameter will be passed onto the service, and it is up to the service provider to ensure the query parameters are honoured

For Externally managed Databases if there is a column for last updated date, then this will be used in the data query, otherwise the updatedAfter parameter will be ignored