Ice9web.com

 

JOINING TABLES

One of SQL's most powerful features is the capability to join tables on-the-fly within data retrieval queries. Joins are one of the most important operations you can perform using SQL SELECT, and a good understanding of joins and join syntax is an extremely important part of learning SQL.
Before you can effectively use joins, you must understand relational tables and the basics of relational database design. What follows is by no means a complete coverage of the subject, but it should be enough to get you up and running.

 

Understanding Joins

One of SQL's most powerful features is the capability to join tables on-the-fly within data retrieval queries. Joins are one of the most important operations you can perform using SQL SELECT, and a good understanding of joins and join syntax is an extremely important part of learning SQL.

Before you can effectively use joins, you must understand relational tables and the basics of relational database design. What follows is by no means a complete coverage of the subject, but it should be enough to get you up and running.
 

Understanding Relational Tables

The best way to understand relational tables is to look at a real world example.

Suppose you had a database table containing a product catalog, with each catalog item in its own row. The kind of information you would store with each item would include a product description and price, along with vendor information about the company that creates the product.

Now suppose you had multiple catalog items created by the same vendor. Where would you store the vendor information (things such as vendor name, address, and contact information)? You wouldn't want to store that data along with the products for several reasons:

  • Because the vendor information is the same for each product that vendor produces, repeating the information for each product is a waste of time and storage space.
  • If vendor information changes (for example, if the vendor moves or his area code changes), you would need to update every occurrence of the vendor information.
  • When data is repeated (that is, the vendor information is used with each product), there is a high likelihood that the data will not be entered exactly the same way each time. Inconsistent data is extremely difficult to use in reporting.

The key here is that having multiple occurrences of the same data is never a good thing, and that principle is the basis for relational database design. Relational tables are designed so information is split into multiple tables, one for each data type. The tables are related to each other through common values (and thus the relational in relational design).

In our example, you can create two tables, one for vendor information and one for product information. The vendors table contains all the vendor information, one table row per vendor, along with a unique identifier for each vendor. This value, called a primary key, can be a vendor ID, or any other unique value. (Primary keys were first mentioned in Tutorial 1, "Understanding SQL").

The products table stores only product information, and no vendor specific information other than the vendor ID (the vendors table's primary key). This key, called a foreign key, relates the vendors table to the products table, and using this vendor ID enables you to use the vendors table to find the details about the appropriate vendor.

 

New Term

Foreign Key A column in one table that contains the primary key values from another table, thus defining the relationships between tables.
 

What does this do for you? Well, consider the following:

  • Vendor information is never repeated, and so time and space are not wasted.
  • If vendor information changes, you can update a single record in the vendors table. Data in related tables does not change.
  • As no data is repeated, the data used is obviously consistent, making data reporting and manipulation much simpler.

The bottom line is that relational data can be stored efficiently and manipulated easily. Because of this, relational databases scale far better than non-relational databases.

 

New Term

Scale Able to handle an increasing load without failing. A well-designed database or application is said to scale well.
 

 

 
Why Use Joins?

As just explained, breaking data into multiple tables enables more efficient storage, easier manipulation, and greater scalability. But these benefits come with a price.

If data is stored in multiple tables, how can you retrieve that data with a single SELECT statement?

The answer is to use a join. Simply put, a join is a mechanism used to associate tables within a SELECT statement (and thus the name join). Using a special syntax, multiple tables can be joined so a single set of output is returned, and the join associates the correct rows in each table on-the-fly.

Note

Maintaining Referential Integrity It is important to understand that a join is not a physical entity in other words, it does not exist in the actual database tables. A join is created by MySQL as needed, and it persists for the duration of the query execution.

When using relational tables, it is important that only valid data is inserted into relational columns. Going back to the example, if products were stored in the products table with an invalid vendor ID (one not present in the vendors table), those products would be inaccessible because they would not be related to any vendor.

To prevent this from occurring, MySQL can be instructed to only allow valid values (ones present in the vendors table) in the vendor ID column in the products table. This is known as maintaining referential integrity, and is achieved by specifying the primary and foreign keys as part of the table definitions (as will be explained in Tutorial 26, "Creating and Manipulating Tables").

Creating a Join

Creating a join is very simple. You must specify all the tables to be included and how they are related to each other. Look at the following example:

• Input

SELECT vend_name, prod_name, prod_price
FROM vendors, products
WHERE vendors.vend_id = products.vend_id
ORDER BY vend_name, prod_name;

Creating a Join


• Output

+-------------+----------------+------------+
| vend_name   | prod_name      | prod_price |
+-------------+----------------+------------+
| ACME        | Bird seed      | 10.00      |
| ACME        | Carrots        | 2.50       |
| ACME        | Detonator      | 13.00      |
| ACME        | Safe           | 50.00      |
| ACME        | Sling          | 4.49       |
| ACME        | TNT (1 stick)  | 2.50       |
| ACME        | TNT (5 sticks) | 10.00      |
| Anvils R Us | .5 ton anvil   | 5.99       |
| Anvils R Us | 1 ton anvil    | 9.99       |
| Anvils R Us | 2 ton anvil    | 14.99      |
| Jet Set     | JetPack 1000   | 35.00      |
| Jet Set     | JetPack 2000   | 55.00      |
| LT Supplies | Fuses          | 3.42       |
| LT Supplies | Oil can        | 8.99       |
+-------------+----------------+------------+

• Analysis

Take a look at the preceding code. The SELECT statement starts in the same way as all the statements you've looked at thus far, by specifying the columns to be retrieved. The big difference here is that two of the specified columns (prod_name and prod_price) are in one table, whereas the other (vend_name) is in another table.

Now look at the FROM clause. Unlike all the prior SELECT statements, this one has two tables listed in the FROM clause, vendors and products. These are the names of the two tables that are being joined in this SELECT statement. The tables are correctly joined with a WHERE clause that instructs MySQL to match vend_id in the vendors table with vend_id in the products table.

You'll notice that the columns are specified as vendors.vend_id and products.vend_id. This fully qualified column name is required here because if you just specified vend_id, MySQL cannot tell which vend_id columns you are referring to (as there are two of them, one in each table).

 

 

Caution

Fully Qualifying Column Names You must use the fully qualified column name (table and column separated by a period) whenever there is a possible ambiguity about to which column you are referring. MySQL returns an error message if you refer to an ambiguous column name without fully qualifying it with a table name.

 

 
The Importance of the WHERE Clause

It might seem strange to use a WHERE clause to set the join relationship, but actually, there is a very good reason for this. Remember, when tables are joined in a SELECT statement, that relationship is constructed on-the-fly. Nothing in the database table definitions can instruct MySQL how to join the tables. You have to do that yourself. When you join two tables, what you are actually doing is pairing every row in the first table with every row in the second table. The WHERE clause acts as a filter to only include rows that match the specified filter conditionthe join condition, in this case. Without the WHERE clause, every row in the first table is paired with every row in the second table, regardless of if they logically go together or not.

 

New Term

Cartesian Product The results returned by a table relationship without a join condition. The number of rows retrieved is the number of rows in the first table multiplied by the number of rows in the second table.

To understand this, look at the following SELECT statement and output:

• Input

SELECT vend_name, prod_name, prod_price
FROM vendors, products
ORDER BY vend_name, prod_name;

 Creating a Join
Creating a Join


• Output

+----------------+----------------+------------+
| vend_name      | prod_name      | prod_price |
+----------------+----------------+------------+
| ACME           | .5 ton anvil   | 5.99       |
| ACME           | 1 ton anvil    | 9.99       |
| ACME           | 2 ton anvil    | 14.99      |
| ACME           | Bird seed      | 10.00      |
| ACME           | Carrots        | 2.50       |
| ACME           | Detonator      | 13.00      |
| ACME           | Fuses          | 3.42       |
| ACME           | JetPack 1000   | 35.00      |
| ACME           | JetPack 2000   | 55.00      |
| ACME           | Oil can        | 8.99       |
| ACME           | Safe           | 50.00      |
| ACME           | Sling          | 4.49       |
| ACME           | TNT (1 stick)  | 2.50       |
| ACME           | TNT (5 sticks) | 10.00      |
| Anvils R Us    | .5 ton anvil   | 5.99       |
| Anvils R Us    | 1 ton anvil    | 9.99       |
| Anvils R Us    | 2 ton anvil    | 14.99      |
| Anvils R Us    | Bird seed      | 10.00      |
| Anvils R Us    | Carrots        | 2.50       |
| Anvils R Us    | Detonator      | 13.00      |
| Anvils R Us    | Fuses          | 3.42       |
| Anvils R Us    | JetPack 1000   | 35.00      |
| Anvils R Us    | JetPack 2000   | 55.00      |
| Anvils R Us    | Oil can        | 8.99       |
| Anvils R Us    | Safe           | 50.00      |
| Anvils R Us    | Sling          | 4.49       |
| Anvils R Us    | TNT (1 stick)  | 2.50       |
| Anvils R Us    | TNT (5 sticks) | 10.00      |
| Furball Inc.   | .5 ton anvil   | 5.99       |
| Furball Inc.   | 1 ton anvil    | 9.99       |
| Furball Inc.   | 2 ton anvil    | 14.99      |
| Furball Inc.   | Bird seed      | 10.00      |
| Furball Inc.   | Carrots        | 2.50       |
| Furball Inc.   | Detonator      | 13.00      |
| Furball Inc.   | Fuses          | 3.42       |
| Furball Inc.   | JetPack 1000   | 35.00      |
| Furball Inc.   | JetPack 2000   | 55.00      |
| Furball Inc.   | Oil can        | 8.99       |
| Furball Inc.   | Safe           | 50.00      |
| Furball Inc.   | Sling          | 4.49       |
| Furball Inc.   | TNT (1 stick)  | 2.50       |
| Furball Inc.   | TNT (5 sticks) | 10.00      |
| Jet Set        | .5 ton anvil   | 5.99       |
| Jet Set        | 1 ton anvil    | 9.99       |
| Jet Set        | 2 ton anvil    | 14.99      |
| Jet Set        | Bird seed      | 10.00      |
| Jet Set        | Carrots        | 2.50       |
| Jet Set        | Detonator      | 13.00      |
| Jet Set        | Fuses          | 3.42       |
| Jet Set        | JetPack 1000   | 35.00      |
| Jet Set        | JetPack 2000   | 55.00      |
| Jet Set        | Oil can        | 8.99       |
| Jet Set        | Safe           | 50.00      |
| Jet Set        | Sling          | 4.49       |
| Jet Set        | TNT (1 stick)  | 2.50       |
| Jet Set        | TNT (5 sticks) | 10.00      |
| Jouets Et Ours | .5 ton anvil   | 5.99       |
| Jouets Et Ours | 1 ton anvil    | 9.99       |
| Jouets Et Ours | 2 ton anvil    | 14.99      |
| Jouets Et Ours | Bird seed      | 10.00      |
| Jouets Et Ours | Carrots        | 2.50       |
| Jouets Et Ours | Detonator      | 13.00      |
| Jouets Et Ours | Fuses          | 3.42       |
| Jouets Et Ours | JetPack 1000   | 35.00      |
| Jouets Et Ours | JetPack 2000   | 55.00      |
| Jouets Et Ours | Oil can        | 8.99       |
| Jouets Et Ours | Safe           | 50.00      |
| Jouets Et Ours | Sling          | 4.49       |
| Jouets Et Ours | TNT (1 stick)  | 2.50       |
| Jouets Et Ours | TNT (5 sticks) | 10.00      |
| LT Supplies    | .5 ton anvil   | 5.99       |
| LT Supplies    | 1 ton anvil    | 9.99       |
| LT Supplies    | 2 ton anvil    | 14.99      |
| LT Supplies    | Bird seed      | 10.00      |
| LT Supplies    | Carrots        | 2.50       |
| LT Supplies    | Detonator      | 13.00      |
| LT Supplies    | Fuses          | 3.42       |
| LT Supplies    | JetPack 1000   | 35.00      |
| LT Supplies    | JetPack 2000   | 55.00      |
| LT Supplies    | Oil can        | 8.99       |
| LT Supplies    | Safe           | 50.00      |
| LT Supplies    | Sling          | 4.49       |
| LT Supplies    | TNT (1 stick)  | 2.50       |
| LT Supplies    | TNT (5 sticks) | 10.00      |
+----------------+----------------+------------+

• Analysis

As you can see in the preceding output, the Cartesian product is seldom what you want. The data returned here has matched every product with every vendor, including products with the incorrect vendor (and even vendors with no products at all).

 

 

Caution

Don't Forget the WHERE Clause Make sure all your joins have WHERE clauses, or MySQL returns far more data than you want. Similarly, make sure your WHERE clauses are correct. An incorrect filter condition causes MySQL to return incorrect data.
 

 

Tip

Cross Joins Sometimes you'll hear the type of join that returns a Cartesian Product referred to as a cross join.
 

 

 
Inner Joins

The join you have been using so far is called an equijoina join based on the testing of equality between two tables. This kind of join is also called an inner join. In fact, you may use a slightly different syntax for these joins, specifying the type of join explicitly. The following SELECT statement returns the exact same data as the preceding example:

• Input

SELECT vend_name, prod_name, prod_price
FROM vendors INNER JOIN products
 ON vendors.vend_id = products.vend_id;

 Creating a Join


• Analysis

The SELECT in the statement is the same as the preceding SELECT statement, but the FROM clause is different. Here the relationship between the two tables is part of the FROM clause specified as INNER JOIN. When using this syntax the join condition is specified using the special ON clause instead of a WHERE clause. The actual condition passed to ON is the same as would be passed to WHERE.

Note

Which Syntax to Use? Per the ANSI SQL specification, use of the INNER JOIN syntax is preferable. Furthermore, although using the WHERE clause to define joins is indeed simpler, using explicit join syntax ensures that you will never forget the join condition, and it can affect performance, too (in some cases).
 

 

 
Joining Multiple Tables

SQL imposes no limit to the number of tables that may be joined in a SELECT statement. The basic rules for creating the join remain the same. First list all the tables, and then define the relationship between each. Here is an example:

• Input

SELECT prod_name, vend_name, prod_price, quantity
FROM orderitems, products, vendors
WHERE products.vend_id = vendors.vend_id
  AND orderitems.prod_id = products.prod_id
  AND order_num = 20005;

Creating a Join

 

 

• Output

+----------------+-------------+------------+----------+
| prod_name      | vend_name   | prod_price | quantity |
+----------------+-------------+------------+----------+
| .5 ton anvil   | Anvils R Us |       5.99 |       10 |
| 1 ton anvil    | Anvils R Us |       9.99 |        3 |
| TNT (5 sticks) | ACME        |      10.00 |        5 |
| Bird seed      | ACME        |      10.00 |        1 |
+----------------+-------------+------------+----------+


• Analysis

This example displays the items in order number 20005. Order items are stored in the orderitems table. Each product is stored by its product ID, which refers to a product in the products table. The products are linked to the appropriate vendor in the vendors table by the vendor ID, which is stored with each product record. The FROM clause here lists the three tables, and the WHERE clause defines both of those join conditions. An additional WHERE condition is then used to filter just the items for order 20005.

 

 

Caution

Performance Considerations MySQL processes joins at run-time, relating each table as specified. This process can become very resource intensive, so be careful not to join tables unnecessarily. The more tables you join, the more performance degrades.

Now would be a good time to revisit the following example from Tutorial 19, "Working with Sub-queries." As you will recall, this SELECT statement returns a list of customers who ordered product TNT2:

• Input

SELECT cust_name, cust_contact
FROM customers
WHERE cust_id IN (SELECT cust_id
                  FROM orders
                  WHERE order_num IN (SELECT order_num
                                      FROM orderitems
                                      WHERE prod_id = 'TNT2'));

Creating a Join

 


As mentioned in Tutorial 19, sub-queries might not always the most efficient way to perform complex SELECT operations, and so as promised, here is the same query using joins:

 

• Input

SELECT cust_name, cust_contact
FROM customers, orders, orderitems
WHERE customers.cust_id = orders.cust_id
  AND orderitems.order_num = orders.order_num
  AND prod_id = 'TNT2';

Creating a Join

• Output

+----------------+--------------+
| cust_name      | cust_contact |
+----------------+--------------+
| Coyote Inc.    | Y Lee        |
| Yosemite Place | Y Sam        |
+----------------+--------------+


• Analysis

As explained in Tutorial 19, returning the data needed in this query requires the use of three tables. But instead of using them within nested sub-queries, here two joins are used to connect the tables. There are three WHERE clause conditions here. The first two connect the tables in the join, and the last one filters the data for product TNT2.

 

Tip

It Pays to Experiment As you can see, there is often more than one way to perform any given SQL operation. And there is rarely a definitive right or wrong way. Performance can be affected by the type of operation, the amount of data in the tables, whether indexes and keys are present, and a whole slew of other criteria. Therefore, it is often worth experimenting with different selection mechanisms to find the one that works best for you.

 

Summary

Joins are one of the most important and powerful features in SQL, and using them effectively requires a basic understanding of relational database design. In this tutorial, you learned some of the basics of relational database design as an introduction to learning about joins. You also learned how to create an equi-join (also known as an inner join), which is the most commonly used form of join. In the next tutorial you'll learn how to create other types of joins.

[Next]