Basic query operation: the join
SQL syntax
In most queries, we will want to see data from two or more tables.
To do this, we need to join the tables in a way that matches up
the right information from each one to the other—in our example, listing all of the
customer data along with all of the order data for the orders that each customer
has placed. In the latest SQL standard, the join is specified in the FROM clause:
SELECT *
FROM customers
NATURAL JOIN orders;
Customers joined to Orders | | | | | | |
---|
Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-14 | Patrick |
Wayne | Dick | 562-777-3030 | 1250 Bellflower | 90840 | 2003-07-14 | Patrick |
Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-18 | Kathleen |
Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-20 | Kathleen |
The NATURAL JOIN keyword specifies that the attributes whose values
will be matched between the two tables are those with matching names; with very rare
exceptions, these will be the
pk/fk attributes, and they have to have matching
data types (and domains) as well. The join attributes that are
matched—here {cfirstname, clastname, cphone}—are shown only
once in the result, along with all other attributes of both tables.
Notice that all of the customer information is repeated
for each order that the customer has placed; this is expected because of the
one-to-many relationship between Customers and Orders. Notice also that any
customer who has not placed an order is missing from the results; this is also
expected because there is no fk in the Orders table to match that customer's
pk in the Customer's table.
The NATURAL JOIN shown here matches the RA join definition precisely.
Unfortunately, not all database systems support the NATURAL JOIN keyword, so
you may have to use a different syntax and may see slightly different results.
We will discuss other join types in a later page.
How it works
The easiest way to understand the join is to think of the database
software looking one-by-one at each pair of rows from the two tables.
Customers | | | | |
---|
Tom | Jewett | 714-555-1212 | 10200 Slater | 92708 |
Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 |
Wayne | Dick | 562-777-3030 | 1250 Bellflower | 90840 |
Orders | | | | |
---|
Alvaro | Monge | 562-333-4141 | 2003-07-14 | Patrick |
Wayne | Dick | 562-777-3030 | 2003-07-14 | Patrick |
Alvaro | Monge | 562-333-4141 | 2003-07-18 | Kathleen |
Alvaro | Monge | 562-333-4141 | 2003-07-20 | Kathleen |
Look at the first row of Orders and the first row of Customers.
Do the join attributes (pk/fk pair) match? No, so nothing happens; these rows are not part of
the join results. Now look at the first row of Orders and the second row of Customers.
Do the join attributes (pk/fk pair) match? Yes, they do. All of the attributes of each
row are pasted together (with one copy of the matching pk/fk pair). You can see this in
the first row of the SQL join example above. First row of Orders and third row of
Customers? No match, so no resulting row.
Next, look at the second row of Orders and the first row of Customers.
No match, no resulting row. Same with second row of Orders to second row of Customers.
Second row of Orders to third row of Customers? Yes; these are
pasted together and included in the result.
Repeat this for the third and fourth rows of Customers and all of the rows of
Orders. Two matches are found, which results in two pasted rows in the result. Notice that
none of the Orders rows have matched the first row of Customers.
If you have taken even the most basic course in algorithms, you will realize
that SQL doesn’t really implement the join in the way that we’ve just described it.
However, this mental model will help you to follow the RA definition (below), with the
inside workings of the database software left to a more advanced course or tutorial.
RA syntax
The RA join of two relations, r over scheme R and s
over scheme S, is written r ⋈ s, or in our example,
customers ⋈ orders.
The scheme of the result,
exactly as you have seen in the SQL syntax, is the union of the two relation
schemes, R∪S. The join attributes are found in the intersection of the two
schemes, R∩S. Clearly, the intersection attributes must inherit the same assignment
rule from R and S; this makes the two schemes compatible.
The result of the RA join consists of the pairwise paste
of all tuples from the two relations, written paste(t,u) for any tuple t from
relation r over scheme R and u from relation s over scheme S. The result
of the paste operation is exactly as explained in the preceeding section.
In the very, very rare case where there is no intersection between
schemes R and S (that is, R∩S = {null}), the schemes are still compatible and every
tuple from relation r is pasted to every tuple from relation s, with all
of the attributes from both schemes contained in the resulting tuples. In set theory, this
is a Cartesian product of the two relations; in practice, it is almost
always nonsense and not what you want. The Cartesian product can also be written in RA as
r × s,
or intentionally specified in SQL with the CROSS JOIN keyword.
You might want to visualize the three possible results of the paste
operation using the graphic representation of schemes and tuples that we presented in
an earlier page.

Other views of this diagram:
Large image -
Description (text)