Skip to main content

Command Palette

Search for a command to run...

Full Outer Join - How to Compare Tables

This started out as just a quick task of writing a SQL statement to compare two tables, but ultimately defined an approach for doing full outer joins.

Updated
5 min read
Full Outer Join - How to Compare Tables

It’s not all that often than I need to do a full out join. Heck, the “Oracle join syntax” doesn’t even support it. I’ve changed to using the JOIN keyword (sometimes referred to as an ANSI join) which makes things so much more readable and understandable—and you can do a FULL OUTER JOIN.

TLDR

Jump to the solution at the bottom and skip the why and how.

What is a FULL OUTER JOIN?

It’s easiest to describe the results. Let’s say we have two tables as shown below:

CARTRUCK
IDID
MAKEMAKE
MODELMODEL
YEARYEAR
CYLINDERSCYLINDERS
PASSENGERS
TRUNK_SIZE
BED_LENGTH
TONNAGE

If I want to query the database view USER_TAB_COLS to get the results above, A FULL OUTER JOIN will do it. A FULL OUTER JOIN will get all of the data from two data sets, aligning those that match and putting nulls where they do not…but it’s not super easy.

How to Get There

If I run the following query:

select column_name
  from user_tab_cols
  where table_name = 'CAR'

I will get one row for each the columns for the CAR table: 7 rows. And, of course, if I change the table_name to TRUCK I will get 7 rows, but there will be 2 different rows.

If I want all of the columns that exist in both tables I would use an INNER JOIN:

select t1.column_name car_column, t2.column_name truck_column
  from user_tab_cols t1
  inner join user_tab_cols t2
    on t2.column_name = t1.column_name
    and t2.table_name = 'TRUCK'
  where t1.table_name = 'CAR'

I will get one row for each the columns that are in both tables: 5 rows.

But what if I want the results of the table above? In this case I need to have one row for every row that is in either table, but if they match, they should be on the same row. This is where the outer join comes in, but it’s not as simple as you might think. If I just change it to a LEFT OUTER JOIN I get the following:

select t1.column_name car_column, t2.column_name truck_column
  from user_tab_cols t1
  left outer join user_tab_cols t2
    on t2.column_name = t1.column_name
    and t2.table_name = 'TRUCK'
  where t1.table_name = 'CAR'
CAR_COLUMNTRUCK_COLUMN
IDID
MAKEMAKE
MODELMODEL
YEARYEAR
CYLINDERSCYLINDERS
PASSENGERS
TRUNK_SIZE

That happens because the where clause will ONLY bring back the rows “where t1.table_name = ‘CAR’”. I still sometimes think that just using a FULL OUTER JOIN will fix this. But the following query brings back the same 7 rows.

select t1.column_name car_column, t2.column_name truck_column
  from user_tab_cols t1
  full outer join user_tab_cols t2
    on t2.column_name = t1.column_name
    and t2.table_name = 'TRUCK'
  where t1.table_name = 'CAR'

For the same reason as the left outer join, the query above can only bring back 7 rows. This is where things get complicated with a FULL OUTER JOIN. I need a WHERE clause that will allow for any row in either table. And this is where things get a little strange. I would think that this would bring back everything I want:

select t1.column_name car_column, t2.column_name truck_column
  from user_tab_cols t1
  full outer join user_tab_cols t2
    on t2.column_name = t1.column_name
    and t2.table_name = 'TRUCK'
  where (t1.table_name = 'CAR' or t1.table_name is null)
    and (t2.table_name = 'TRUCK' or t2.table_name is null)

But it doesn’t! I still only get 7 rows—the same rows returned with a LEFT OUTER JOIN. If I want to get all 9 rows, I need to fully understand the FULL OUTER JOIN. Well, I’ll admit, I don’t. I just know how to get it to work:

select t1.column_name car_column, t2.column_name truck_column
  from user_tab_cols t1
  full outer join user_tab_cols t2
    on t2.column_name = t1.column_name
    and t2.table_name = 'TRUCK'
    and t1.table_name = 'CAR' -- <= this is key
  where (t1.table_name = 'CAR' or t1.table_name is null)
    and (t2.table_name = 'TRUCK' or t2.table_name is null)

That’s right. You need that extra AND condition in the FULL OUTER JOIN. It seems strange that an AND brings back more rows, but it does. The query above gives all 9 rows to match the very first table in this blog post.

This is great, but that query is slow. It takes over 2 seconds in my environment. That’s because the database has to look through the whole user_tab_cols view to do all of its work.

The Solution

I can speed things up by telling the database to only consider the rows I care about in each table/view (in this case I’m using the same view, USER_TAB_COLS, twice). I can do this with a CTE (a WITH clause):

with 
t1 as (
    select table_name, column_name
      from user_tab_cols
      where table_name = 'CAR'
),
t2 as (
    select table_name, column_name
      from user_tab_cols
      where table_name = 'TRUCK'
)
  select t1.column_name car_column, t2.column_name truck_column
  from t1
  full outer join t2
      on t2.column_name = t1.column_name
-- Now you don't need the following 4 lines
--      and t2.table_name = 'TRUCK'        
--      and t1.table_name = 'CAR' -- <= this is key
--  where (t1.table_name = 'CAR' or t1.table_name is null)
--    and (t2.table_name = 'TRUCK' or t2.table_name is null)

It runs in just 10% of the time—less than .2 seconds in my environment.

In my experience, this is how you should write a FULL OUTER JOIN.

  1. Use CTEs (with clauses) to define the data set of each table in the FULL OUTER JOIN.

  2. Create the main query with just a FULL OUTER JOIN—no where clause required.