Use of Redis replicas
With Redis cluster, master nodes can be backed by replicas (one or many). Replicas are used for failover and scalability:
Although other topologies support the use of replicas, this document is written with clusters in mind.
Failover scenarios
When the Redis cluster detects a master node is down, it initiates failover to one of the master's replicas. Replicas use the replication process to mirror updates from the master node as they happen.
In Kubernetes, when the previously crashed pod recovers and re-joins the cluster, it will switch to a replica role of the master currently serving the slots (which used to be its replica).
If replicas are not used, Kubernetes will still detect (using probes) and restart unresponsive pods. The slots served by the impacted master will be temporarily unavailable. Depending on the duration of the outage, HCL CacheCircuit Breakers will activate. It might take a couple of minutes for the Redis node to be available again. This time is extended when persistence is used, as Redis needs to re-load the cache upon start, and the service is unavailable until the cache is done loading.
Scalability scenarios
Besides their role for failover, replicas can increase scalability by handling
GET
/ read operations. This frees resources on the master node and enables
more efficient use of resources. The HCL Cache Redis client can be
configured to direct read operations to replicas using the readMode configuration.
When replicas are used for read operations, the following consideration must be made:
- The replication process introduces a delay. If a read operation happens immediately after a write, the read might return stale data, or no data. This could introduce functional issues for certain caches and customizations scenarios. The HCL Cache includes a number of configurations that control whether reads are directed to masters or replicas, and wait times for replications to complete.
- If replicas are used for reads, both master and replica servers must be available for
optimal performance: An unavailable replica can lead to WAIT command
timeouts during
PUT
operations (syncReplicas, see below), and failed read (GET
) operations executed on the replicas. When the recovered master is restarted, it reconfigures itself a replica and starts a new syncronization process. If a full synchronization is required, the replica server might be unavailable for some time while the database is replicated. The system might take longer to recover when read operations are offloaded to replicas.
Configurations
Replicas and the HCL Cache might require configuration changes in Redis, the Redis client or the HCL Cache:
- Redis configurations
-
cluster-replica-validity-factor
: xxxxxxxrepl-diskless-sync
client-output-buffer-limit
- Redis client configurations
- The HCL Cache can be configured to issue read
(
GET
) operations to replica servers with the readMode setting.
- HCL Cache configurations
- The HCL Cache includes a number of advanced cache level
configurations to control the behaviour of
PUT
operations when replicas are used. These settings are more relevant when readMode: SLAVE is used.cacheConfigs: cacheName: remoteConfig: forceReadFromMaster: [TRUE|false] syncReplicas: [NULL| <number_of_replicas> OR all : timeout_ms] limitSyncReplicasToNumberAvailable: [TRUE|false]
- forceReadFromMaster
- When readMode is set to SLAVE or
MASTER_SLAVE, the
forceReadFromMaster
configuration ensures that writes (PUT
) for this cache are sent to the master server.