Launching EC2 Instances with the CrateDB AMI

CrateDB provides custom public AMI s that can be launched by every EC2 user by using the AWS Management Console or AWS CLI. The naming convention is as follows:

crate-<VERSION>-<REVISION>-<AMI_REVISION>-<BASE_AMI>

<VERSION>-<REVISION> is the CrateDB version and its revision in the format w.x.y-z, <AMI_REVISION> is the AMI build revision and <BASE_AMI> is the full name of the Amazon Linux base image.

Example:

crate-0.51.1-1-1-amzn-ami-hvm-2015.03.0.x86_64

The CrateDB AMI is based on comes with Java 8 and CrateDB pre-installed.

Finding the CrateDB AMI

To launch an EC2 instance based on the CrateDB AMI, use the search panel to find the image in the public AMI repository. After choosing the desired AMI by its CrateDB version, it can be launched as an EC2 instance running on your account.

Search for public CrateDB AMI on AWS Management Console

If you are using the AWS CLI you can find the public CrateDB AMI by using the describe-images command. The easiest way to search for it is by using its AMI Name. AMIs are listed in JSON-format and sorted by their creation date. The following command lists every available CrateDB AMI on the AWS platform.

sh$ aws ec2 describe-images --filters "Name=name,Values=crate-0.51.1-*"

Launch the AMI

You can launch one or more instances with the CrateDB AMI using the run-instances command utilizing the --image-id command line option. You can use the parameters of the command to configure its start-up behavior and hardware configuration. A valid key-pair is necessary to connect to the instances later.

sh$ aws ec2 run-instances --image-id <AMI-ID> --count <NR-OF-INSTANCES> --iam-instance-profile Name="<IAM-ROLE>" --instance-type <INSTANCE-TYPE> --user-data <USER-DATA>  --key-name <KEY-NAME>

Cloud-Init

The AMI creates a basic CrateDB configuration at first launch (using Cloud-Init). During startup the Cloud-Init process checks if ephemeral devices are attached to the instance. If the device contains no filesystem it will be formatted with the ext4 file system. Otherwise the filesystem already installed will be used. The mount point of each attached device is defined as /mnt/<DEV-NAME> where <DEV-NAME> is the name of the device as listed in /dev. The configuration settings (defined in /etc/crate/crate.yml are listed below and include:

  • Mount ephemeral devices and set the CrateDB data path on their mount points (Attached Devices)
  • Enable EC2 discovery
  • Set node name to instance hostname

In addition to the preconfigured Cloud-Init setup, it is possible to inject commands as a User-Data shell-script.

Note

Note that User-Data scripts are called before the pre-installed Cloud-Init scripts and therefore the injected User-Data settings might be overridden by Cloud-Init (see Cloud-Init).

EC2 Discovery

The AMI uses EC2 discovery as a default unicast host discovery mechanism. (see CrateDB Setup on Amazon EC2). For security reasons it is strongly recommended to use IAM roles instead of providing your AWS credentials manually on your instances (see Authentication).

Note

EC2 discovery is only available on CrateDB version 0.51.0 or higher.

Adapt CrateDB Configuration

The CrateDB configuration file /etc/crate/crate.yml can be adapted on startup by using the User-Data script (see Cloud-Init). The following example shows how to set the minimum_master_nodes and gateway configuration setting which are essential for a multi node setup. This configuration is used when a cluster with 3 or more nodes is set up.

#!/bin/bash
echo "
discovery.zen.minimum_master_nodes: 2
" >> /etc/crate/crate.yml

echo "
gateway:
  recover_after_nodes: 3
  recover_after_time: 5m
  expected_nodes: 3
" >> /etc/crate/crate.yml

Launch AMI by Using the AWS CLI

Instances which have the IAM role crate-ec2 assigned will securely fetch their credentials which are then used for authentication in the discovery mechanism. The following example shows how to run a single instance of type m3.medium. The adapted CrateDB configuration is passed via user-data.sh file.

sh$ aws ec2 run-instances --image-id ami-544c1923 --count 1 --iam-instance-profile Name="crate-ec2" --instance-type m3.medium --user-data $(base64 user-data.sh) --key-name my_key_pair

Note

In case you want to filter machines by instance tags or security groups, add this configuration to the USER-DATA field parameter (see Filter by Tags).

Instance Types

The instance type specifies the combination of CPU, memory, storage and networking capacity. To receive better performance for running queries select an instance type which gives the possibility to attach ephemeral storage. On newer AWS instance types this storage is covered by Solid-State-Drives (short SSD). By choosing one of those instance types CrateDB will automatically mount and store its data on those devices if they are attached to the instance as a block device mapping (see also Attached Devices). Instance Types with additional instance store volumes (SSD or HDD) are currently all instances of type m3, g2, r3, d2 and i2.

Attached Devices

To add a block device mapping before launching an instance, it is possible to use the block-device-mappings parameter with the run-instances command. In this case ephemeral1 will be added as an instance store volume with the device name /dev/sdc. For additional info see device naming on linux instances.

sh$ aws ec2 run-instances --image-id ami-544c1923 --count 1 --instance-type m3.medium --block-device-mappings "[{\"DeviceName\": \"/dev/sdc\",\"VirtualName\":\"ephemeral1\"}]"

Note

Note that the data stored on ephemeral disks is not permanent and only persists during the lifetime of an instance.

If no block device mapping is configured on the EC2 instance, the default data directory of CrateDB is set to /var/lib/crate. The data paths are set in /etc/crate/crate.yml.