Backup snapshot and restore for Elasticsearch
ElasticSearch includes a snapshot capability that can be used to backup and restore indexes to a variety of external snapshot repositories, such as a GCS bucket.
Refer to Elasticsearch documentation on how to Register a Snapshot Repository. For illustration purpose, this topic uses Google Cloud Storage as our custom snapshot repository
Configuring Elasticsearch to work with an existing Google Cloud Storage account
The standard Elasticsearch image needs to be customized to enable the
repository-gcs
plugin, and install the GCS service account key
JSON file which gives access to your GCS Bucket where the index snapshots will be
stored.
The following is a sample Dockerfile that can be used to custom build the Elasticsearch 7.17.10 image:
docker build --pull -t us.gcr.io/commerce-product/performance/elastic/elasticsearch:7.17.10 .
docker push uus.gcr.io/<your path>/elastic/elasticsearch:7.17.10
Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:7.17.10
RUN bin/elasticsearch-plugin install --batch repository-gcs
COPY sa-wc-es-snapshot.json /tmp/sa-wc-es-snapshot.json
RUN bin/elasticsearch-keystore add-file --force gcs.client.commerce.credentials_file /tmp/sa-wc-es-snapshot.json
RUN rm /tmp/sa-wc-es-snapshot.json
Note that sa-wc-es-snapshot.json
is your GCS service account key
JSON file. For more information on how to set up your service account credentials,
refer to Using a Service Account.
Creating and registering your index snapshot repository
Example:
PUT /_snapshot/<your-repository-name>
{
"type": "gcs",
"settings": {
"client": "<your-gcs-client-name>",
"bucket": "<your-gcs-bucket-name>",
"base_path": "<your-repository-name>"
}
}
Additional Cluster access must be readonly to avoid Concurrent Modification errors. For example:
PUT /_snapshot/<your-repository-name>
{
"type": "gcs",
"settings": {
"client": "<your-gcs-client-name>",
"bucket": "<your-gcs-bucket-name>",
"base_path": "<your-repository-name>",
"readonly":"true"
}
}
For more information about the parameters used in this example, refer to the Elasticsearch Repository Settings documentation.
To retrieve the definition of your snapshot repository, issue the following call:
GET /_snapshot/
Backing up your indexes as a snapshot
Your system is now ready to Create new snapshots. The following is an example for creating a nightly backup manually. Alternatively, you can use an SLM Policy to achieve the same result.
PUT /_snapshot/<your-snapshot-name>
{
"schedule": "0 30 1 * * ?",
"name": "<nightly-snap-{now/d}>",
"repository": "<your-repository-name>",
"config": {
"indices": "live.*,.live.*",
"include_global_state": true
},
"retention": {
"expire_after": "30d",
"min_count": 5,
"max_count": 50
}
}
Listing available Snapshots from your Repository
There are two ways to find out what snapshots are available in a given repository, a detailed and a tabulated format. The detailed format is fetched using the following API call:
GET /_snapshot/<your-repository-name>/*?verbose=true
For the same information in tabulated format, issue the following call:
GET /_cat/snapshots/<your-repository-name>?v=true&s=id&pretty
Restoring your indexes from a Snapshot
To restore a snapshot from your repository, use the following API to explicitly recall the backup using index name patterns.
POST /_snapshot/<your-repository-name>/<your-snapshot-name>/_restore
{
"indices": "*,.*"
}
A complete description of the parameters used in this example can be found in the Elasticsearch Restore Snapshot API documentation.
Note that this operation only restores the necessary index files. The index aliases are not yet pointing to them..
Activating restored indexes for use by HCL Commerce Search
When Ingest performs a full re-indexing operation, each build is tagged with a unique revision identifier. This identifier is a twelve digit number in the format of YYYYMMDDHHMM, called time.id. For example, the product index name can be .auth.11.product.202405142058.
Once you have identified the time.id from your restored indexes that you want to activate, use the following Ingest API to perform the index alias switching:
POST /connectors/<envType>.alias/run?storeId=<storeId>&timeId=<timeId>
where
- envtype
- Either "auth" or "live".
- storeid
- The owning store identifier.
- timeid
- The twelve-digit time identifier.
Removing a Snapshot
Use the following API to explicitly remove a snapshot.
DELETE /_snapshot/<your-repository-name>/<your-snapshot-name>