What's the difference between a CTE and a Temp Table?
What is the difference between a Common Table Expression (CTE) and a temp table? And when should I use one over the other?
WITH cte (Column1, Column2, Column3) AS ( SELECT Column1, Column2, Column3 FROM SomeTable ) SELECT * FROM cte
SELECT Column1, Column2, Column3 INTO #tmpTable FROM SomeTable SELECT * FROM #tmpTable
This is pretty broad, but I'll give you as general an answer as I can.
- Are unindexable (but can use existing indexes on referenced objects)
- Cannot have constraints
- Are essentially disposable
- Persist only until the next query is run
- Can be recursive
- Do not have dedicated stats (rely on stats on the underlying objects)
- Are real materialized tables that exist in tempdb
- Can be indexed
- Can have constraints
- Persist for the life of the current CONNECTION
- Can be referenced by other queries or subprocedures
- Have dedicated stats generated by the engine
As far as when to use each, they have very different use cases. If you will have a very large result set, or need to refer to it more than once, put it in a
#temptable. If it needs to be recursive, is disposable, or is just to simplify something logically, a
CTEshould never be used for performance. You will almost never speed things up by using a CTE, because, again, it's just a disposable view. You can do some neat things with them but speeding up a query isn't really one of them.
Speeding up many queries using CTE's is also a thing because with CTEs you can add your own business knowledge to outperform the query optimiser. For example, you can have part 1 of your CTE select from tables where you know that the resulting rows will be very small. Inside the same query, you can join this tiny resultset to some larger resultset and completely bypass issues caused by stale statistics etc. To do this, you need to add query hints to force the order. It works, it improves performance.
"never be used for performance" is a broad and somewhat subjective statement, although I understand your point. Albeit, in addition to the other comments, another potential performance gain from using a CTE can occur when switching to a recursive CTE from another form of recursion such as recursive procedure calls or a cursor.
Please see Martin's comments below:
The CTE is not materialised as a table in memory. It is just a way of encapsulating a query definition. In the case of the OP it will be inlined and the same as just doing
SELECT Column1, Column2, Column3 FROM SomeTable. Most of the time they do not get materialised up front, which is why this returns no rows
WITH T(X) AS (SELECT NEWID())SELECT * FROM T T1 JOIN T T2 ON T1.X=T2.X, also check the execution plans. Though sometimes it is possible to hack the plan to get a spool. There is a connect item requesting a hint for this. – Martin Smith Feb 15 '12 at 17:08
A CTE creates the table being used in memory, but is only valid for the specific query following it. When using recursion, this can be an effective structure.
You might also want to consider using a table variable. This is used as a temp table is used and can be used multiple times without needing to be re-materialized for each join. Also, if you need to persist a few records now, add a few more records after the next select, add a few more records after another op, then return just those handful of records, then this can be a handy structure, as it doesn't need to be dropped after execution. Mostly just syntactic sugar. However, if you keep the row-count low, it never materializes to disk. See What's the difference between a temp table and table variable in SQL Server? for more details.
A temp table is literally a table created on disk, just in a specific database that everyone knows can be deleted. It is the responsibility of a good dev to destroy those tables when they are no longer needed, but a DBA can also wipe them.
Temporary tables come in two variety: Local and global. In terms of MS Sql Server you use a
#tableNamedesignation for local, and
##tableNamedesignation for global (note the use of a single or double # as the identifying characteristic).
Notice that with temp tables, as opposed to table variables or CTE, you can apply indexes and the like, as these are legitimately tables in the normal sense of the word.
Generally I would use temp tables for longer or larger queries, and CTEs or table variables if I had a small dataset already and wanted to just quickly script up a bit of code for something small. Experience and the advice of others indicates that you should use CTEs where you have a small number of rows being returned from it. If you have a large number, you would probably benefit from the ability to index on the temp table.
The CTE is not materialised as a table in memory. It is just a way of encapsulating a query definition. In the case of the OP it will be inlined and the same as just doing `SELECT Column1, Column2, Column3 FROM SomeTable`
Most of the time they do not get materialised up front, which is why this returns no rows `WITH T(X) AS (SELECT NEWID())SELECT * FROM T T1 JOIN T T2 ON T1.X=T2.X`, also check the execution plans. Though sometimes it is possible to hack the plan to get a spool. There is a connect item requesting a hint for this.
The accepted answer here says "a CTE should never be used for performance" - but that could mislead. In the context of CTEs versus temp tables, I've just finished removing a swathe of junk from a suite of stored procs because some doofus must've thought there was little or no overhead to using temp tables. I shoved the lot into CTEs, except those which were legitimately going to be re-used throughout the process. I gained about 20% performance by all metrics. I then set about removing all the cursors which were trying to implement recursive processing. This was where I saw the greatest gain. I ended up slashing response times by a factor of ten.
CTEs and temp tables do have very different use cases. I just want to emphasise that, while not a panacea, the comprehension and correct use of CTEs can lead to some truly stellar improvements in both code quality/maintainability and speed. Since I got a handle on them, I see temp tables and cursors as the great evils of SQL processing. I can get by just fine with table variables and CTEs for almost everything now. My code is cleaner and faster.
Now, let's be fair - cursors are the *great* evil; temp tables are at worst a *lesser* evil. :-) It's *really* unfair to put them at the same level, as you saw yourself.
@RDFozz right, hell has 9 circles as we all know#Nine_circles_of_Hell). Lets put temp tables at 2nd and cursors at ... 7th? ;)
You know what the 'great evil' is in programming? When people say that a particular technique is evil. There is a place for cursors. They can outperform other techniques in certain scenarios. There's no *evil* here - you need to learn to use the right tool for the job. Measure what you're doing and don't believe the hype that CTE's, Temp Tables or Cursors are evil. Measure - because the truth depends on the scenario.
@DaveHilditch that's a fair comment, but it's also fair comment to assert that in very, very many situations, cursors are not the right solution, so it's a workable generalization to have them as,well almost a last resort.
In my experience, a CURSOR is not bad in itself. CURSORS are commonly "wrongly" used by developers because in most programming languages, you have to think iteratively, as opposed to SQL where you mostly have to think in batches. I know this is a common mistake at my workplace where Devs just can't "see" a way out of a problem other than with a CURSOR, thus why a good DBA comes in handy to teach and correct them. @DaveHilditch is totally right: the right tool for the right job is all it takes.
A CTE may be called repeatedly within a query and is evaluated every time it is referenced - this process can be recursive. If it is just referred once then it behaves much like a sub-query, although CTEs can be parameterised.
A temporary table is physically persisted, and may be indexed. In practice the query optimiser may also persist intermediate join or sub-query results behind the scenes, such as in spool operations, so it is not strictly true that the results of CTEs are never persisted to disk.
IIRC table variables (on the other hand) are always in-memory structures.
The primary reason to use CTEs is to access Window Functions such as
row_number()and various others.
This means you can do things like get the first or last row per group VERY VERY quickly and efficiently - more efficiently than other means in most practical cases.
with reallyfastcte as ( select *, row_number() over (partition by groupingcolumn order by sortingcolumn) as rownum from sometable ) select * from reallyfastcte where rownum = 1;
You can run a similar query to the above using a correlated subquery or by using a sub-query but the CTE will be faster in almost all scenarios.
Additionally, CTEs can really help simplify your code. This can lead to performance gains because you understand the query more and can introduce more business logic to help the optimizer be more selective.
Additionally, CTEs can boost performance if you understand your business logic and know which parts of the query should be run first - typically, put your most selective queries first that lead to result sets that can use an index in their next join and add the
option(force order)query hint
Finally, CTEs don't use tempdb by default so you reduce contention on that bottleneck through their use.
Temporary tables should be used if you need to query the data multiple times, or alternatively if you measure your queries and discover that by inserting to a temp table and then adding an index that your performance is improved.
There seems to be a bit of negativity here towards CTE's.
My understanding of a CTE is that it's basically a kind of adhoc view. SQL is both a declarative and a set based language. CTE's are a great way of declaring a set! Not being able to index a CTE is actually a good thing because you don't need to! It's really a kind of syntactic sugar to make the query easier to read/write. Any decent optimizer will work out the best access plan using indexes on the underlying tables. This means you could effectively speed up your CTE query by following the index advice on the underlying tables.
Also, just because you defined a set as a CTE, it doesn't mean that all rows in the set must be processed. Dependent on the query the optimizer might process "just enough" rows to satisfy the query. Maybe you only needed the first 20 or so for your screen. If you built a temp table then you really do need to read/write all those rows!
Based on this I would say that CTE's are a great feature of SQL and can be used anywhere they make the query easier to read. I would only think about a temp table for a batch process that would really need to process every single record. Even then afaik it's not really recommended because on a temp table it's far harder for the database to help you with caching and indexes. It might be better to have a permanent table with a PK field unique to your transaction.
I have to admit that my experience is mainly with DB2 so I'm assuming that CTE's work in a similar way in both products. I will happily stand corrected if CTE's are somehow inferior in SQL server. ;)