Use of Redis replicas

Although other topologies support the use of replicas, this document is written with clusters in mind.

Failover scenarios

When the Redis cluster detects a master node is down, it initiates failover to one of the master's replicas. Replicas use the replication process to mirror updates from the master node as they happen.

In Kubernetes, when the previously crashed pod recovers and re-joins the cluster, it will switch to a replica role of the master currently serving the slots (which used to be its replica).

If replicas are not used, Kubernetes will still detect (using probes) and restart unresponsive pods. The slots served by the impacted master will be temporarily unavailable. Depending on the duration of the outage, HCL CacheCircuit Breakers will activate. It might take a couple of minutes for the Redis node to be available again. This time is extended when persistence is used, as Redis needs to re-load the cache upon start, and the service is unavailable until the cache is done loading.

Scalability scenarios

Besides their role for failover, replicas can increase scalability by handling GET/ read operations. This frees resources on the master node and enables more efficient use of resources. The HCL Cache Redis client can be configured to direct read operations to replicas using the readMode configuration.

When replicas are used for read operations, the following consideration must be made:

The replication process introduces a delay. If a read operation happens immediately after a write, the read might return stale data, or no data. This could introduce functional issues for certain caches and customizations scenarios. The HCL Cache includes a number of configurations that control whether reads are directed to masters or replicas, and wait times for replications to complete.
If replicas are used for reads, both master and replica servers must be available for optimal performance: An unavailable replica can lead to WAIT command timeouts during PUT operations (syncReplicas, see below), and failed read (GET) operations executed on the replicas. When the recovered master is restarted, it reconfigures itself a replica and starts a new syncronization process. If a full synchronization is required, the replica server might be unavailable for some time while the database is replicated. The system might take longer to recover when read operations are offloaded to replicas.

Configurations

Replicas and the HCL Cache might require configuration changes in Redis, the Redis client or the HCL Cache:

Redis configurations

cluster-replica-validity-factor: xxxxxxx
repl-diskless-sync
client-output-buffer-limit

Redis client configurations: The HCL Cache can be configured to issue read (GET) operations to replica servers with the readMode setting.

HCL Cache configurations

The HCL Cache includes a number of advanced cache level configurations to control the behaviour of PUT operations when replicas are used. These settings are more relevant when readMode: SLAVE is used.

cacheConfigs:
  cacheName:
    remoteConfig:
       forceReadFromMaster: [TRUE|false]
       syncReplicas: [NULL| <number_of_replicas> OR all : timeout_ms]
       limitSyncReplicasToNumberAvailable: [TRUE|false]

forceReadFromMaster

When readMode is set to SLAVE or MASTER_SLAVE, the forceReadFromMaster configuration ensures that writes (PUT) for this cache are sent to the master server.

syncReplicas

This configuration is disabled by default. If enabled, the HCL Cache invokes the WAIT command after a PUT operation. The WAIT command introduces a delay until the configured number of replicas has processed the change, or the timeout is reached. Instead of specifiying a fixed number of replicas, it is possible to use all which translates to the number of replicas currently known by the Redis client. Example: With the following configuration, the HCL Cache will wait until the change is replicated to all the available replicas, and wait up to 250 milliseconds:

  syncReplicas: all:250

limitSyncReplicasToNumberAvailable

When syncReplicas is enabled with a number of replicas, the limitSyncReplicasToNumberAvailable configuration can be used to restrict the configured number to the number of replicas currently known by the Redis client.