CREATING ADVANCED JOINS
Using aliases for column names and calculated fields, SQL
also enables you to alias table names. There are two primary
reasons to do this:
* To shorten the SQL syntax
* To enable multiple uses of the same table
within a single SELECT statement
Using Table Aliases
In Tutorial 15, "Creating Calculated Fields," you learned
how to use aliases to refer to retrieved table columns. The
syntax to alias a column looks like this:
• Input
SELECT Concat(RTrim(vend_name), ' (', RTrim(vend_country),
')') AS
vend_title
FROM vendors
ORDER BY vend_name;

In addition to using aliases for column names and calculated
fields, SQL also enables you to alias table names. There are
two primary reasons to do this:
- To shorten the SQL syntax
- To enable multiple uses of the same table within a
single SELECT statement
Take a look at the following SELECT statement. It is
basically the same statement as an example used in the previous
tutorial, but it has been modified to use aliases:
• Input
SELECT cust_name, cust_contact
FROM customers AS c, orders AS o, orderitems AS oi
WHERE c.cust_id = o.cust_id
AND oi.order_num = o.order_num
AND prod_id = 'TNT2';

• Analysis
You'll notice that the three tables in the FROM clauses all
have aliases. customers AS c establishes c as an alias for
customers, and so on. This enables you to use the abbreviated c
instead of the full text customers. In this example, the table
aliases were used only in the WHERE clause, but aliases are not
limited to just WHERE. You can use aliases in the SELECT list,
the ORDER BY clause, and in any other part of the statement as
well.
It is also worth noting that table aliases are only used
during query execution. Unlike column aliases, table aliases
are never returned to the client.
Using Different Join Types
So far, you have used only simple joins known as inner joins
or equiJoins. You'll now take a look at three additional join
types: the self join, the natural join, and the outer join.
Self Joins
As mentioned earlier, one of the primary reasons to use
table aliases is to be able to refer to the same table more
than once in a single SELECT statement. An example will
demonstrate this.
Suppose that a problem was found with a product (item id
DTNTR), and you therefore wanted to know all of the products
made by the same vendor so as to determine if the problem
applied to them, too. This query requires that you first find
out which vendor creates item DTNTR, and next find which other
products are made by the same vendor. The following is one way
to approach this problem:
• Input
SELECT prod_id, prod_name
FROM products
WHERE vend_id = (SELECT vend_id
FROM products
WHERE prod_id = 'DTNTR');

• Output
+---------+----------------+ | prod_id | prod_name | +---------+----------------+ | DTNTR | Detonator | | FB | Bird seed | | FC | Carrots | | SAFE | Safe | | SLING | Sling | | TNT1 | TNT (1 stick) | | TNT2 | TNT (5 sticks) | +---------+----------------+
• Analysis
This first solution uses sub-queries. The inner SELECT
statement does a simple retrieval to return the vend_id of the
vendor that makes item DTNTR. That ID is the one used in the
WHERE clause of the outer query so all items produced by that
vendor are retrieved. (You learned all about sub-queries in
Tutorial 19, "Working with Sub-queries." Refer to that tutorial
for more information.)
Now look at the same query using a join:
• Input
SELECT p1.prod_id, p1.prod_name
FROM products AS p1, products AS p2
WHERE p1.vend_id = p2.vend_id
AND p2.prod_id = 'DTNTR';

• Output
+---------+----------------+ | prod_id | prod_name | +---------+----------------+ | DTNTR | Detonator | | FB | Bird seed | | FC | Carrots | | SAFE | Safe | | SLING | Sling | | TNT1 | TNT (1 stick) | | TNT2 | TNT (5 sticks) | +---------+----------------+
• Analysis
The two tables needed in this query are actually the same
table, and so the products table appears in the FROM clause
twice. Although this is perfectly legal, any references to
table products would be ambiguous because MySQL could not know
to which instance of the products table you are referring.
To resolve this problem, table aliases are used. The first
occurrence of products has an alias of p1, and the second has
an alias of p2. Now those aliases can be used as table names.
The SELECT statement, for example, uses the p1 prefix to
explicitly state the full name of the desired columns. If it
did not, MySQL would return an error because there are two
columns named prod_id and prod_name. It cannot know which one
you want (even though, in truth, they are one and the same).
The WHERE clause first joins the tables (by matching vend_id in
p1 to vend_id in p2), and then it filters the data by prod_id
in the second table to return only the desired data.
Tip
Self Joins Instead of
Sub-queries Self joins are often used to replace statements
using sub-queries that retrieve data from the same table as the
outer statement. Although the end result is the same, sometimes
these joins execute far more quickly than they do sub-queries.
It is usually worth experimenting with both to determine which
performs better.
Natural Joins
Whenever tables are joined, at least one column appears in
more than one table (the columns being joined). Standard joins
(the inner joins you learned about in the previous tutorial)
return all data, even multiple occurrences of the same column.
A natural join simply eliminates those multiple occurrences so
only one of each column is returned.
How does it do this? The answer is it doesn'tyou do it. A
natural join is a join in which you select only columns that
are unique. This is typically done using a wildcard (SELECT *)
for one table and explicit subsets of the columns for all other
tables. The following is an example:
• Input
SELECT c.*, o.order_num, o.order_date,
oi.prod_id, oi.quantity,
OI.item_price
FROM customers AS c, orders AS o, orderitems AS oi
WHERE c.cust_id = o.cust_id
AND oi.order_num = o.order_num
AND prod_id = 'FB';

• Analysis
In this example, a wildcard is used for the first table
only. All other columns are explicitly listed so no duplicate
columns are retrieved.
The truth is, every inner join you have created thus far is
actually a natural join, and you will probably never even need
an inner join that is not a natural join.
Outer Joins
Most joins relate rows in one table with rows in another.
But occasionally, you want to include rows that have no related
rows. For example, you might use joins to accomplish the
following tasks:
- Count how many orders each customer placed, including
customers who have yet to place an order
- List all products with order quantities, including
products not ordered by anyone
- Calculate average sale sizes, taking into account
customers who have not yet placed an order
In each of these examples, the join includes table rows that
have no associated rows in the related table. This type of join
is called an outer join.
The following SELECT statement is a simple inner join. It
retrieves a list of all customers and their orders:
• Input
SELECT customers.cust_id, orders.order_num
FROM customers INNER JOIN orders
ON customers.cust_id = orders.cust_id;

Outer join syntax is similar. To retrieve a list of all
customers, including those who have placed no orders, you can
do the following:
• Input
SELECT customers.cust_id, orders.order_num
FROM customers LEFT OUTER JOIN orders
ON customers.cust_id = orders.cust_id;

• Output
+---------+-----------+ | cust_id | order_num | +---------+-----------+ | 10001 | 20005 | | 10001 | 20009 | | 10002 | NULL | | 10003 | 20006 | | 10004 | 20007 | | 10005 | 20008 | +---------+-----------+
• Analysis
Like the inner join seen in the previous tutorial, this
SELECT statement uses the keywords OUTER JOIN to specify the
join type (instead of specifying it in the WHERE clause). But
unlike inner joins, which relate rows in both tables, outer
joins also include rows with no related rows. When using OUTER
JOIN syntax you must use the RIGHT or LEFT keywords to specify
the table from which to include all rows (RIGHT for the one on
the right of OUTER JOIN, and LEFT for the one on the left). The
previous example uses LEFT OUTER JOIN to select all the rows
from the table on the left in the FROM clause (the customers
table). To select all the rows from the table on the right, you
use a RIGHT OUTER JOIN as seen in this example:
• Input
SELECT customers.cust_id, orders.order_num
FROM customers RIGHT OUTER JOIN orders
ON orders.cust_id = customers.cust_id;

Note
No *= MySQL does not support the use of the simplified *=
and =* syntax popularized by other DBMSs.
Tip
Outer Join Types There are two
basic forms of outer joins the left outer join and the right
outer join. The only difference between them is the order of
the tables they are relating. In other words, a left outer join
can be turned into a right outer join simply by reversing the
order of the tables in the FROM or WHERE clause. As such, the
two types of outer join can be used interchangeably, and the
decision about which one is used is based purely on
convenience.
Using Joins with Aggregate Functions
As you learned in Tutorial 17, "Summarizing Data," aggregate
functions are used to summarize data. Although all the examples
of aggregate functions thus far only summarized data from a
single table, these functions can also be used with joins.
To demonstrate this, let's look at an example. You want to
retrieve a list of all customers and the number of orders that
each has placed. The following code uses the COUNT() function
to achieve this:
• Input
SELECT customers.cust_name,
customers.cust_id,
COUNT(orders.order_num) AS
num_ord
FROM customers INNER JOIN orders
ON customers.cust_id = orders.cust_id
GROUP BY customers.cust_id;

• Output
+----------------+---------+---------+ | cust_name | cust_id | num_ord | +----------------+---------+---------+ | Coyote Inc. | 10001 | 2 | | Wascals | 10003 | 1 | | Yosemite Place | 10004 | 1 | | E Fudd | 10005 | 1 | +----------------+---------+---------+
• Analysis
This SELECT statement uses INNER JOIN to relate the
customers and orders tables to each other. The GROUP BY clause
groups the data by customer, and so the function call
COUNT(orders.order_num) counts the number of orders for each
customer and returns it as num_ord.
Aggregate functions can be used just as easily with other
join types. See the following example:
• Input
SELECT customers.cust_name,
customers.cust_id,
COUNT(orders.order_num) AS
num_ord
FROM customers LEFT OUTER JOIN orders
ON customers.cust_id = orders.cust_id
GROUP BY customers.cust_id;

• Output
+----------------+---------+---------+ | cust_name | cust_id | num_ord | +----------------+---------+---------+ | Coyote Inc. | 10001 | 4 | | Mouse House | 10002 | 0 | | Wascals | 10003 | 1 | | Yosemite Place | 10004 | 1 | | E Fudd | 10005 | 1 | +----------------+---------+---------+
• Analysis
This example uses a left outer join to include all
customers, even those who have not placed any orders. The
results show that customer Mouse House (with 0 orders) is also
included this time.
Using Joins and Join Conditions
Before wrapping up this two-tutorial discussion on joins, it
is worthwhile to summarize some key points regarding joins and
their use:
- Pay careful attention to the type of join being used.
More often than not, you'll want an inner join, but there
are often valid uses for outer joins, too.
- Make sure you use the correct join condition, or you'll
return incorrect data.
- Make sure you always provide a join condition, or
you'll end up with the Cartesian product.
- You may include multiple tables in a join and even have
different join types for each. Although this is legal and
often useful, make sure you test each join separately
before testing them together. This makes troubleshooting
far simpler.
Summary
This tutorial was a continuation of the previous tutorial on
joins. This tutorial started by teaching you how and why to use
aliases, and then continued with a discussion on different join
types and various forms of syntax used with each. You also
learned how to use aggregate functions with joins, and some
important do's and don'ts to keep in mind when working with
joins.
[Next]
|