Joins

Table of Contents

Cross Joins

Referencing two tables results in a CROSS JOIN.

The result is computed by creating every possible combination (Cartesian Product) of their rows (t1 * t2 * t3 * tn) and then applying the given query operation on it (WHERE clause, SELECT list, ORDER BY clause, ...):

cr> select articles.name as article, colors.name as color, price
... from articles cross join colors
... where price > 5000.0
... order by price, color, article;
+------------------------------+---------------+----------+
| article                      | color         |    price |
+------------------------------+---------------+----------+
| Infinite Improbability Drive | Antique White | 19999.99 |
| Infinite Improbability Drive | Gold          | 19999.99 |
| Infinite Improbability Drive | Midnight Blue | 19999.99 |
| Infinite Improbability Drive | Olive Drab    | 19999.99 |
| Starship Titanic             | Antique White | 50000.0  |
| Starship Titanic             | Gold          | 50000.0  |
| Starship Titanic             | Midnight Blue | 50000.0  |
| Starship Titanic             | Olive Drab    | 50000.0  |
+------------------------------+---------------+----------+
SELECT 8 rows in set (... sec)

Cross Joins can be done explicitly using the CROSS JOIN statement as shown in the example above, or implicitly by just specifying two or more tables in the FROM list:

cr> select articles.name as article, colors.name as color, price
... from articles, colors
... where price > 5000.0
... order by price, color, article;
+------------------------------+---------------+----------+
| article                      | color         |    price |
+------------------------------+---------------+----------+
| Infinite Improbability Drive | Antique White | 19999.99 |
| Infinite Improbability Drive | Gold          | 19999.99 |
| Infinite Improbability Drive | Midnight Blue | 19999.99 |
| Infinite Improbability Drive | Olive Drab    | 19999.99 |
| Starship Titanic             | Antique White | 50000.0  |
| Starship Titanic             | Gold          | 50000.0  |
| Starship Titanic             | Midnight Blue | 50000.0  |
| Starship Titanic             | Olive Drab    | 50000.0  |
+------------------------------+---------------+----------+
SELECT 8 rows in set (... sec)

Inner Joins

Inner Joins require each record of one table to have matching records on the other table:

cr> select s.id, s.table_name, t.number_of_shards
... from sys.shards s, information_schema.tables t
... where s.table_name = t.table_name
... and s.table_name = 'employees'
... order by s.id;
+----+------------+------------------+
| id | table_name | number_of_shards |
+----+------------+------------------+
|  0 | employees  |                4 |
|  1 | employees  |                4 |
|  2 | employees  |                4 |
|  3 | employees  |                4 |
+----+------------+------------------+
SELECT 4 rows in set (... sec)

Left Outer Joins

Left outer join returns tuples for all matching records of the left and right relation like Inner join. Additionally it returns tuples for all other records from left that don’t match any record on the right by using null values for the columns of the right relation:

cr> select e.name || ' ' || e.surname as employee, coalesce(d.name, '') as manager_of_department
... from employees e left join departments d
... on e.id = d.manager_id
... order by e.id;
+--------------------+-----------------------+
| employee           | manager_of_department |
+--------------------+-----------------------+
| John Doe           | Administration        |
| John Smith         | IT                    |
| Sean Lee           |                       |
| Rebecca Sean       |                       |
| Tim Ducan          |                       |
| Robert Duval       |                       |
| Clint Johnson      |                       |
| Sarrah Mcmillan    |                       |
| David Limb         |                       |
| David Bowe         |                       |
| Smith Clark        | Marketing             |
| Ted Kennedy        |                       |
| Ronald Reagan      |                       |
| Franklin Rossevelt |                       |
| Sam Malone         |                       |
| Marry Georgia      |                       |
| Tim Doe            | Human Resources       |
| Tim Malone         | Purchasing            |
+--------------------+-----------------------+
SELECT 18 rows in set (... sec)

Right Outer Joins

Right outer join returns tuples for all matching records of the right and left relation like Inner join. Additionally it returns tuples for all other records from right that don’t match any record on the left by using null values for the columns of the left relation:

cr> select e.name || ' ' || e.surname as employee, d.name as manager_of_department
... from employees e right join departments d
... on e.id = d.manager_id
... order by d.id;
+-------------+-----------------------+
| employee    | manager_of_department |
+-------------+-----------------------+
| John Doe    | Administration        |
| Smith Clark | Marketing             |
| Tim Malone  | Purchasing            |
| Tim Doe     | Human Resources       |
|             | Shipping              |
| John Smith  | IT                    |
+-------------+-----------------------+
SELECT 6 rows in set (... sec)

Full Outer Joins

Full outer join returns tuples for all matching records of the left and right relation like Inner join. Additionally it returns tuples for all other records from left that don’t match any record on the right by using null values for the columns of the right relation. Additionally it returns tuples for all other records from right that don’t match any record on the left by using null values for the columns of the left relation:

cr> select e.name || ' ' || e.surname as employee, coalesce(d.name, '') as manager_of_department
... from employees e full join departments d
... on e.id = d.manager_id
... order by e.id;
+--------------------+-----------------------+
| employee           | manager_of_department |
+--------------------+-----------------------+
| John Doe           | Administration        |
| John Smith         | IT                    |
| Sean Lee           |                       |
| Rebecca Sean       |                       |
| Tim Ducan          |                       |
| Robert Duval       |                       |
| Clint Johnson      |                       |
| Sarrah Mcmillan    |                       |
| David Limb         |                       |
| David Bowe         |                       |
| Smith Clark        | Marketing             |
| Ted Kennedy        |                       |
| Ronald Reagan      |                       |
| Franklin Rossevelt |                       |
| Sam Malone         |                       |
| Marry Georgia      |                       |
| Tim Doe            | Human Resources       |
| Tim Malone         | Purchasing            |
|                    | Shipping              |
+--------------------+-----------------------+
SELECT 19 rows in set (... sec)

Join Conditions

CrateDB supports all operators and scalar functions as join conditions in the WHERE clause.

Example with within scalar function:

cr> select photos.name, countries.name
... from countries, photos
... where within(location, geo)
... order by countries.name, photos.name;
+--------------+---------+
| name         | name    |
+--------------+---------+
| Eiffel Tower | France  |
| Berlin Wall  | Germany |
+--------------+---------+
SELECT 2 rows in set (... sec)

However, there are no specific optimisations for certain join conditions such as = (equals) yet. The JOIN operation is implemented as a generic nested loop that invokes the operation on every record of the left table with every record of the right table. However, these filters are applied in a distributed way for each shard of the used tables.

Limitations

  • Joining more than 2 tables can result in execution plans which perform poorly as there is no query optimizer in place yet.