CrateDB and Docker

CrateDB and Docker are a great match thanks to CrateDB’s shared-nothing, horizontally scalable architecture that lends itself well to containerization.

This document covers the essentials of running CrateDB and Docker using the official CrateDB Docker image.

Note

If you’re just getting started with CrateDB and Docker, check out our introductory guides for running Docker locally or Docker Cloud.

Quick Start

Creating a Cluster

To get started with CrateDB and Docker on your dev machine, start by creating a user-defined network, like so:

sh$ docker network create crate

You should then be able to see something like this:

sh$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
1bf1b7acd66f        bridge              bridge              local
51cebbdf7d2b        crate               bridge              local
5b8e6fbe9ab6        host                host                local
8baa149b6986        none                null                local

Any CrateDB containers put into the crate network will be able to resolve other CrateDB containers by name.

You can then create your first CrateDB container, like so:

sh$ docker run -d --name=crate01 \
      --net=crate -p 4201:4200 --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_

Breaking it down, that command:

  • Creates and runs a container called crate01 in detached mode
  • Puts the container into the crate network and maps port 4201 on your host machine to port 4200 on the container
  • Runs the command crate process inside the container

Check the node is up with docker ps and you should see something like this:

sh$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                             NAMES
f79116373877        crate               "/docker-entrypoin..."   16 seconds ago      Up 15 seconds       4300/tcp, 5432-5532/tcp, 0.0.0.0:4201->4200/tcp   crate01

You can visit the admin UI in your browser with this URL:

http://localhost:4201/

Select the Cluster icon from the left hand navigation, and you should see a page that lists a single node.

Let’s add another node to our cluster.

Docker automatically resolves hostnames to container names within the user-defined crate network, so any subsequent nodes can access this node using the crate01 hostname.

So, let’s create a second node, called crate02, and configure it to ping crate01 for cluster discovery. Like so:

sh$ docker run -d --name=crate02 \
      --net=crate -p 4202:4200 --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cdiscovery.zen.ping.unicast.hosts=crate01

Notice here how we also updated the port mapping, so that port 4200 on the container is mapped to 4202 on our host machine.

Now, if you go back to the admin UI you already have open, or visit the admin UI of the node you just created (located at http://localhost:4202/) you should see two nodes.

Success! You just created a CrateDB cluster with Docker.

You can now add subsequent nodes, like so:

sh$ docker run -d --name=crate03 \
      --net=crate -p 4203:4200  --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cdiscovery.zen.ping.unicast.hosts=crate01,crate02

Note

This is only an quick start example and you will notice some failing checks in the admin UI. For a cluster that you intend to use seriously, you should, at the very least, configure the Metadata Gateway and Discovery mechanisms.

CrateDB Shell

The CrateDB Shell, crash, is bundled with the Docker image.

If you wanted to run crash inside a user-defined network called crate and connect to three hosts named crate01, crate02, and crate03 (i.e. the example covered in the Creating a Cluster section) you could run:

$ docker run --rm -ti \
    --net=crate crate \
    crash --hosts crate01 crate02 crate03

Production Use

One Container Per Host

For performance reasons, we strongly recommend that you only run one container per host machine.

If you are running one container per machine, you can map the container ports to the host ports so that the host acts like a native installation. You can do that like so:

$ docker run -d -p 4200:4200 -p 4300:4300 -p 5432:5432 crate \
    crate -Cnetwork.host=_site_

Persistent Data Directory

Docker containers are ephemeral, meaning that containers are expected to come and go, and any data inside them is lost when the container goes away. For this reason, it is required that you mount a persistent data directory on your host machine to the /data directory inside the container, like so:

$ docker run -d -v /srv/crate/data:/data crate \
    crate -Cnetwork.host=_site_

Here, /srv/crate/data is an example path, and should be replaced with the path to your host machine data directory.

See the Docker volume documentation for more help.

Custom Configuration

If you want to use a custom configuration, it is strongly recommended that you mount configuration files on the host machine to the appropriate path inside the container. That way, your configuration isn’t lost when the container goes away.

Here’s how you would mount crate.yml:

$ docker run -d \
    -v /srv/crate/config/crate.yml:/crate/config/crate.yml crate \
    crate -Cnetwork.host=_site_

Here, /srv/crate/config/crate.yml is an example path, and should be replaced with the path to your host machine crate.yml file.

Resource Constraints

It is very important that you set resource constraints when you are running CrateDB inside Docker. Current versions of the JVM are unable to detect that they are running inside a container, and as a result, the detected limits are wrong.

Memory

You must calculate and explicitly set the maximum memory the container can use. This is dependant on your host system, and typically, you want to make this as high as possible.

You must then calculate the appropriate heap size (typically half the container memory limit, see CRATE_HEAP_SIZE for details) and pass this to CrateDB, which in turn passes it to the JVM.

Don’t worry about configuring swap. CrateDB does not use swap.

CPU

You must calculate and explicitly set the maximum number of CPUs the container can use. This is dependant on your host system, and typically, you want to make this as high as possible.

Combined Configuration

If the container should use a maximum of 1.5 CPUs and a maximum of 2 GB memory, and if the appropriate heap size should be 1 GB, you could configure everything at once like so:

$ docker run -d \
    --cpus 1.5 \
    --memory 2g \
    --env CRATE_HEAP_SIZE=1g \
    crate \
    crate -Cnetwork.host=_site_