Elasticsearch Cluster Backup
View SourceRelease NotesThis folder contains a Terraform module to take and backup snapshots of an Elasticsearch cluster to an S3 bucket. The module is a scheduled lambda function that calls the Elasticsearch API to perform snapshotting and backup related tasks documented here;
Terminologies
- Snapshot: A snapshot represents the current state of the indices in an Elasticsearch cluster. This is the information stored in a backup repository.
- Repository: A repository is an Elasticsearch abstraction over a storage medium like a Shared File System, S3 Bucket, HDFS etc. It's used to identify where snapshot files are stored and doesn't contain any snapshots itself.
Taking Backups
Cluster snapshots are incremental. The first snapshot is always a full dump of the cluster and subsequent ones are a delta between the current state of the cluster and the previous snapshot. Snapshots are typically contained in .dat files stored in the storage medium (in this case S3) the repository points to.
CPU and Memory Usage
Snapshots are usually run on a single node which automatically co-ordinates with other nodes to ensure completenss of data. Backup of a cluster with a large volume of data will lead to high CPU and memory usage on the node performing the backup. This module makes backup requests to the cluster through the load balancer which routes the request to one of the nodes, during backup, if the selected node is unable to handle incoming requests the load balancer will direct the request to other nodes.
Frequency of Backups
How often you make backups depends entirely on the size of your deployment and the importance of your data. Larger clusters with high volume usage will typically need to be backed up more frequently than low volume clusters because of the amount of data change between snapshots. It's a safe bet to start off running backups on a nightly schedule and then continually tweak the schedule based on the demands of your cluster.
Backup Notification
The time it takes to backup a cluster is dependent on the volume of data. However, since the backup module is implemened as a Lambda function which has a maximum execution time of 5 minutes a separate notification Lambda is kicked off. A Cloudwatch metric is incremented any time the notification lambda confirms that a backup occured and an alarm connected to that metric notifies you where or not it was updated.
Restoring Backups
Restoring snapshots is handled by the elasticsearch-cluster-restore module.
Sample Usage
- Terraform
- Terragrunt
# ------------------------------------------------------------------------------------------------------
# DEPLOY GRUNTWORK'S ELASTICSEARCH-CLUSTER-BACKUP MODULE
# ------------------------------------------------------------------------------------------------------
module "elasticsearch_cluster_backup" {
source = "git::git@github.com:gruntwork-io/terraform-aws-elk.git//modules/elasticsearch-cluster-backup?ref=v0.11.1"
# ----------------------------------------------------------------------------------------------------
# REQUIRED VARIABLES
# ----------------------------------------------------------------------------------------------------
# How often, in seconds, the backup lambda function is expected to run. You should
# factor in the amount of time it takes to backup your cluster.
alarm_period = <INPUT REQUIRED>
# The ARN of SNS topics to notify if the CloudWatch alarm goes off because the
# backup job failed.
alarm_sns_topic_arns = <INPUT REQUIRED>
# The S3 bucket that the specified repository will be associated with and where
# all snapshots will be stored
bucket = <INPUT REQUIRED>
# The name for the CloudWatch Metric the AWS lambda backup function will increment
# every time the job completes successfully.
cloudwatch_metric_name = <INPUT REQUIRED>
# The namespace for the CloudWatch Metric the AWS lambda backup function will
# increment every time the job completes successfully.
cloudwatch_metric_namespace = <INPUT REQUIRED>
# The DNS to the Load Balancer in front of the Elasticsearch cluster
elasticsearch_dns = <INPUT REQUIRED>
# The name of the Lambda function. Used to namespace all resources created by this
# module.
name = <INPUT REQUIRED>
# The AWS region (e.g us-east-1) where the backup S3 bucket exists.
region = <INPUT REQUIRED>
# The name of the repository that will be associated with the created snapshots
repository = <INPUT REQUIRED>
# An expression that defines the schedule for this lambda job. For example, cron(0
# 20 * * ? *) or rate(5 minutes).
schedule_expression = <INPUT REQUIRED>
# ----------------------------------------------------------------------------------------------------
# OPTIONAL VARIABLES
# ----------------------------------------------------------------------------------------------------
# The port on which the API requests will be made to the Elasticsearch cluster
elasticsearch_port = 9200
# The runtime to use for the Lambda function. Should be a Node.js runtime.
lambda_runtime = "nodejs14.x"
# Specifies the protocol to use when making the request to the Elasticsearch
# cluster. Possible values are HTTP or HTTPS
protocol = "http"
# Set to true to give your Lambda function access to resources within a VPC.
run_in_vpc = false
# A list of subnet IDs the Lambda function should be able to access within your
# VPC. Only used if var.run_in_vpc is true.
subnet_ids = []
# The ID of the VPC the Lambda function should be able to access. Only used if
# var.run_in_vpc is true.
vpc_id = null
}
# Coming soon!
Reference
- Inputs
- Outputs
Required
alarm_periodnumberHow often, in seconds, the backup lambda function is expected to run. You should factor in the amount of time it takes to backup your cluster.
alarm_sns_topic_arnslist(string)The ARN of SNS topics to notify if the CloudWatch alarm goes off because the backup job failed.
bucketstringThe S3 bucket that the specified repository will be associated with and where all snapshots will be stored
cloudwatch_metric_namestringThe name for the CloudWatch Metric the AWS lambda backup function will increment every time the job completes successfully.
The namespace for the CloudWatch Metric the AWS lambda backup function will increment every time the job completes successfully.
elasticsearch_dnsstringThe DNS to the Load Balancer in front of the Elasticsearch cluster
namestringThe name of the Lambda function. Used to namespace all resources created by this module.
regionstringThe AWS region (e.g us-east-1) where the backup S3 bucket exists.
repositorystringThe name of the repository that will be associated with the created snapshots
schedule_expressionstringAn expression that defines the schedule for this lambda job. For example, cron(0 20 * ? ) or rate(5 minutes).
Optional
elasticsearch_portnumberThe port on which the API requests will be made to the Elasticsearch cluster
9200lambda_runtimestringThe runtime to use for the Lambda function. Should be a Node.js runtime.
"nodejs14.x"protocolstringSpecifies the protocol to use when making the request to the Elasticsearch cluster. Possible values are HTTP or HTTPS
"http"run_in_vpcboolSet to true to give your Lambda function access to resources within a VPC.
falsesubnet_idslist(string)A list of subnet IDs the Lambda function should be able to access within your VPC. Only used if run_in_vpc is true.
[]vpc_idstringThe ID of the VPC the Lambda function should be able to access. Only used if run_in_vpc is true.
null