Best practice between using LEFT JOIN or NOT EXISTS
Is there a best practice between using a LEFT JOIN or a NOT EXISTS format?
What is benefit to using one over the other?
If none, which should be preferred?
SELECT * FROM tableA A LEFT JOIN tableB B ON A.idx = B.idx WHERE B.idx IS NULL
SELECT * FROM tableA A WHERE NOT EXISTS (SELECT idx FROM tableB B WHERE B.idx = A.idx)
I am using queries within Access against a SQL Server database.
The biggest difference is not in the join vs not exists, it is (as written), the
On the first example, you get all columns from both
B, whereas in the second example, you get only columns from
In SQL Server, the second variant is slightly faster in a very simple contrived example:
Create two sample tables:
CREATE TABLE dbo.A ( A_ID INT NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1) ); CREATE TABLE dbo.B ( B_ID INT NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1) ); GO
Insert 10,000 rows into each table:
INSERT INTO dbo.A DEFAULT VALUES; GO 10000 INSERT INTO dbo.B DEFAULT VALUES; GO 10000
Remove every 5th row from the second table:
DELETE FROM dbo.B WHERE B_ID % 5 = 1; SELECT COUNT(*) -- shows 10,000 FROM dbo.A; SELECT COUNT(*) -- shows 8,000 FROM dbo.B;
Perform the two test
SELECT * FROM dbo.A LEFT JOIN dbo.B ON A.A_ID = B.B_ID WHERE B.B_ID IS NULL; SELECT * FROM dbo.A WHERE NOT EXISTS (SELECT 1 FROM dbo.B WHERE b.B_ID = a.A_ID);
The second variant does not need to perform the filter operation since it can use the left anti-semi join operator.
Logically they are identical, but
NOT EXISTSis closer to the AntiSemiJoin that you're asking for, and is generally preferred. It also highlights better that you can't access the columns in B, because it's only used as a filter (as opposed to having them available with NULL values).
Many years ago (SQL Server 6.0 ish),
LEFT JOINwas quicker, but that hasn't been the case for a very long time. These days,
NOT EXISTSis marginally faster.
The biggest impact in Access is that the
JOINmethod has to complete the join before filtering it, constructing the joined set in memory. Using
NOT EXISTSit checks for the row but doesn't allocate space for the columns. Plus, it stops looking once it finds a row. Performance varies a bit more in Access, but a general rule of thumb is that
NOT EXISTStends to be a little faster. I'd be less inclined to say it's "best practice", as there are more factors involved.
An exception I've noticed to the
NOT EXISTSbeing superior (however marginally) to
LEFT JOIN ... WHERE IS NULLis when using Linked Servers.
From examining the execution plans, it appears that
NOT EXISTSoperator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).
Linked servers are brutal for that kind of thing. A possible approach to solving that problem is to copy the remote data over the linked server link using a simple `INSERT INTO #t (a,b,c) SELECT a,b,c FROM LinkedServer.database.dbo.table WHERE x=y` then running the `NOT EXISTS (...)` clause against that temporary copy of the database.
In general, the engine will create an execution plan based essentially on:
- The number of rows in A and B
- Whether there is an Index on A and/or B.
- The expected number of result rows (and intermediate rows)
- The form of the input query (i.e. your question)
The "not exists" plan encourages a seek based plan on table B. This is a good choice when table A is small and table B is large (and an index exists on B).
The "antijoin" plan is a good choice when table A is very large or table B is very small or no index on B and returning a large result set.
However it is just an "encouragement", like a weighted input. A strong (1),(2),(3) often makes the choice for (4) moot.
( Ignoring the effect of your example returning different columns due to the *, addressed by @MaxVernon answer. ).