Mark’s Site

Pensieve for coding and golf :-)

SQL Joins

By admin • Apr 11th, 2008 • Category: 2.2. PHP and MySQL

Some Background and Example

Relational databases, despite popular belief, are not so termed because of the ability to set keys that link tables together, but because ‘relational’ in mathematics means ‘based on tables’. The model does, however, conveniently allow us to:

  • Encourage common data reuse across different tables.
  • Ensure data integrity by setting up keys.

A key (not to be confused with Foreign Keys) is a type of constraint which ensures certain critical information is not duplicated within a given table. Take for example, usernames to a system. No two users may have the same username, so we setup [username] as a unique key to ensure data integrity. Primary keys also ensure data integrity by ensuring that each object can be uniquely identified for each table.

Please Note: I will not be covering ALL join types, only ones that I think relevant to the majority of web programming.

The JOIN by WHERE

Taking the above examples, how would we return usernames along with their correct gender name? We can link the IDs in a WHERE statement:

SELECT t1.[username], t2.[gender_name]
FROM [user_log] AS t1, [gender] AS t2
WHERE t1.[gender_id] = t2.[gender_id] 

That’s the first way I learned about joining information in SQL, but I don’t want you to remember it. I want you to get into the habit of using JOINs to join tables. That is, after all, their purpose. *

The INNER JOIN

So our same query, using joins, is as follows:

SELECT t1.[username], t2.[gender_name]
FROM [user_log] AS t1
INNER JOIN [gender] AS t2 ON t1.[gender_id] = t2.[gender_id] 

The INNER JOIN combines the records from our two tables (t1 and t2) on the given join condition (t1.[gender_id] = t2.[gender_id]). SQL combines each record in table t1 with every record in t2, and only the records that meet our join condition remain.

You need to be careful with INNER JOINs when you work with data that contains NULL values, as they will not return results on tables where the columns you’re joining contain NULL entries unless you integrate some ‘IS NULL’ join condition (if this is your case, perhaps you should look at OUTER JOINs later in this article).

The NATURAL JOIN

Natural joins are a special type of INNER JOIN that compares all columns in the joined tables t1 and t2 that have the same column name (in our example, [gender_id]). Our above example could be expressed as a natural join in the following way:

SELECT t1.[username], t2.[gender_name]
FROM [user_log] AS t1
NATURAL JOIN [gender] AS t2 

I rarely use natural joins because I tend not to name my column names equally in joined tables. Each to their own. I doubt there’s any performance advantage between the a NATURAL JOIN and a plain INNER JOIN. If anyone has benchmarked any difference, please let me know.

The OUTER JOIN

Unlike INNER JOINs, OUTER JOINs do not require your records to have a matching record in the adjoining table. This becomes useful when our data contains NULL values, as an OUTER JOIN will return the results from t1, and report NULL on the joined column.

The different types of OUTER JOINs dictate which table(s) to retain rows from.

The LEFT OUTER JOIN

This retains records from the left table in our join (in our example, t1), even if the join condition doesn’t find any matching records in t2. It will just return NULL for the empty results of t2. This would be useful if our [user_log] table contained a user that had no gender specified. We could use a LEFT OUTER JOIN to return all users and their genders, showing NULL for the user without any gender specified.

The RIGHT OUTER JOIN

This is just the reverse of the LEFT OUTER JOIN, retaining all records from the right table in our join. So say for example, we added a new gender called ‘Unspecified’, yet we didn’t have any users in the [user_log] that used this. A RIGHT OUTER JOIN would return all genders, leaving us an ‘Unspecified’ gender with no user associated.

The FULL OUTER JOIN

You guessed it, a FULL OUTER JOIN retains all records from both tables, returning NULLs if records do not exist in the adjoining table.

Conclusion

So there you have it, pretty much all joins you can poke a stick at. I rarely find the need to use anything but an INNER JOIN, but when the need arises, you’ll know what JOINs you have in your SQL arsenal.

Good luck :-)
————————-
* As a general rule, use JOIN statement to join information, and leave the WHERE statement free to add conditions to your queries.

admin is
Email this author | All posts by admin

Leave a Reply