Rolling Upgrade

CrateDB provides an easy way to perform a rolling cluster upgrade with zero downtime.

Warning

Rolling upgrades are only possible if you are using a stable version of CrateDB and are upgrading to a new patch version (see versions).

If you are upgrading to a testing version or are upgrading to a new major version or a new minor version, you must perform a full cluster restart.

Check the Release Notes for the version you are upgrading to for any specific instructions that may override this.

To perform a rolling upgrade of a cluster, one node at a time has to be stopped using the graceful stop procedure (see Signal Handling).

This procedure will disable a node, which will cause it to reject any new requests but will make sure that any pending requests are finished.

Note

Due to the distributed execution of requests, some client requests might fail during a rolling upgrade.

This process will ensure a certain data availability. Which can either be none, primaries, or full and can be configured using SET and RESET.

Using full, all shards currently located on the node will be moved to the other nodes in order to stop gracefully. Using this setting the cluster will stay green the whole time.

Using primaries, only the primaries will be moved to other nodes. Using this setting means that the cluster will go into the yellow warning state if a node that has been stopped contained replicas that are then unavailable.

Using none, there is no data-availability guarantee. The node will stop, possibly leaving the cluster in the critical red state if the node contained a primary that has no replicas that can take over.

Caution

If you are upgrading from 0.54 to 0.55, please read the accompanying notes for the 0.55 release.

Requirements

Full Minimum Data Availability

If the full minimum data availability is configured the cluster needs to contain enough nodes to hold the number of replicas that are configured even if one node is missing.

So for example if there are only two nodes in a cluster and a table has one replica configured the graceful stop procedure will not succeed and abort as it won’t be possible to relocate the replicas.

If a table has a range configured as number of replicas this will take into account the upper number of replicas.

So, with two nodes and 0-1 replicas, the graceful stop procedure will abort.

In short: for the full graceful stop to work the following has to be true:

number_of_nodes > max_number_of_replicas + 1

Primaries Minimum Data Availability

If the primaries minimum data availability is used, take care that there are still enough replicas in the cluster after a node has been stopped so that a write-consistency can be guaranteed.

By default write or delete operations only succeed if a quorum (> replicas / 2 + 1) of active shards is available.

Note

If only 1 replica is configured one active shard suffices in order for write and delete operations to succeed.

Upgrade Process

Warning

Before upgrading, you should back up your data.

Step 1: Disable Allocations

First, you have to prevent the cluster from re-distributing shards and replicas while certain nodes are not available. You can do that by disabling re-allocations and only allowing new primary allocations.

Use the SET and RESET command to do so:

cr> SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';
SET OK, 1 row affected (... sec)

Note

This step may be omited if you set the cluster.graceful_stop.min_availability setting to full.

Step 2: Graceful Stop

The crate process supports multiple signals (see Signal Handling) that invoke different shutdown procedures. To stop a CrateDB process gracefully you need to send the USR2 signal:

sh$ kill -USR2 `cat /path/to/crate.pid`

Note

If your system uses SysVinit (e.g. Debian Wheezy), you can also use service crate graceful-stop.

Depending on the size of your cluster this might take a while. You might want to check your server logs to see if the graceful stop process is progressing well. In case of an error or a timeout, the node will stay up, signaling the error in its log files (or wherever you put your log messages).

Using the default settings the node will shut down by moving all primary shards off the node first. This will ensure that no data is lost. However, the cluster health will most likely turn yellow, because replicas that lived on that node will be missing.

If you want to ensure green health, you need to change the cluster.graceful_stop.min_availability setting to full. This will move all shards off the node before shutting down.

Keep in mind that reallocating shards might take some time depending on the number of shards and the amount and size of records (and/or blob data). For that reason you should set the timeout setting to a reasonable time. By default the shutdown process aborts and the cluster will start distributing shards evenly again. If you want to force a shutdown after the timeout, even if the reallocating is not finished, you can set the force setting to true.

Warning

A forced stop does not ensure the minimum data availability defined in the settings and may result in temporary or even permanent loss of data!

Note

When using cluster.graceful_stop.min_availability=full there have to be enough nodes in the cluster to move shards or else the graceful shutdown procedure will fail!

For example, if there are 4 nodes and 3 configured replicas, there will not be enough nodes to to fulfill the required replicas.

Note

Also, if there is not enough disk space on other nodes to move the shards to the graceful stop procedure will fail.

You can also set the reallocate setting to false if you want to ensure that a node cannot be stopped if the cluster would need to move shards off the node. This can prevent long waiting times until the shutdown process runs into timeout during reallocation.

By default, only the graceful stop command considers the cluster settings described at Graceful Stop.

Observing the Reallocation

If you want to observe the reallocation process triggered by a full or primaries graceful-stop, you can issue the following sql queries regularly.

Get the number of shards remaining on your deallocating node:

cr> SELECT count(*) as remaining_shards from sys.shards
... where _node['name'] = 'your_node_name';
+------------------+
| remaining_shards |
+------------------+
|                0 |
+------------------+
SELECT 1 row in set (... sec)

Get some more details about what shards are remaining on your node:

cr> SELECT schema_name as schema, table_name as "table", id, "primary", state
... FROM sys.shards
... WHERE _node['name'] = 'your_node_name' AND schema_name IN ('blob', 'doc')
... ORDER BY schema, "table", id, "primary", state;
+--------+-------+----+---------+-------+
| schema | table | id | primary | state |
+--------+-------+----+---------+-------+
...
SELECT ... rows in set (... sec)

In the case of primaries availability, only the primary shards of tables with zero replicas will be reallocated. Use this query to find out which shards to look for:

cr> SELECT table_schema as schema, table_name as "table"
... FROM information_schema.tables
... WHERE number_of_replicas = 0 and table_schema in ('blob', 'doc')
... ORDER BY schema, "table" ;
+--------+-------...+
| schema | table ...|
+--------+-------...+
...
+--------+-------...+
SELECT ... rows in set (... sec)

Note

If you observe the graceful-stop process using the admin UI, you might see the cluster turning red for a small instant when a node finally shuts down. This is due to the way the admin UI determines the cluster state.

If a query fails due to a missing node, the admin UI may falsely consider the cluster to be in a critical state.

Step 3: Upgrade CrateDB

After the node is stopped you can safely upgrade your CrateDB installation. Depending on your installation and operating system you can do it by downloading the latest tarball or just use the package manager.

Example for RHEL/YUM:

$sh yum update -y crate

If you are in doubt how to upgrade an installed package, please refer to the man pages of your operating system or package manager.

Step 4: Start CrateDB

Once the upgrade process is completed you can start the CrateDB process again by either invoking the bin/crate executable from the tarball directly:

sh$ /path/to/bin/crate

Or using the service manager of your operating system.

Example for RHEL/YUM:

sh$ service crate start

Step 5: Repeat

Repeat step two, three, and four for all other nodes.

Step 6: Enable Allocations

Last but not least when all nodes are updated you can re-enable allocations again that have been disabled in the first step:

cr> SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';
SET OK, 1 row affected (... sec)