Combine column from multiple rows into single row
I've got some
customer_comments
split out into multiple rows due to database design, and for a report I need to combine thecomments
from each uniqueid
into one row. I previously tried something working with this delimited list from SELECT clause and COALESCE trick but I can't recall it and must not have saved it. I can't seem to get it to work in this case either, only seems to work on a single row.The data looks like this:
id row_num customer_code comments ----------------------------------- 1 1 Dilbert Hard 1 2 Dilbert Worker 2 1 Wally Lazy
My results need to look like this:
id customer_code comments ------------------------------ 1 Dilbert Hard Worker 2 Wally Lazy
So for each
row_num
there's really only one row of results; the comments should be combined in the order ofrow_num
. The above linkedSELECT
trick works to get all the values for a specific query as one row, but I can't figure out how to make it work as part of aSELECT
statement that spits all these rows out.My query has to go through the whole table on its own and output these rows. I'm not combining them into multiple columns, one for each row, so
PIVOT
doesn't seem applicable.This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:
DECLARE @x TABLE ( id INT, row_num INT, customer_code VARCHAR(32), comments VARCHAR(32) ); INSERT @x SELECT 1,1,'Dilbert','Hard' UNION ALL SELECT 1,2,'Dilbert','Worker' UNION ALL SELECT 2,1,'Wally','Lazy'; SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments FROM @x AS x2 WHERE id = x.id ORDER BY row_num FOR XML PATH('')), 1, 1, '') FROM @x AS x GROUP BY id, customer_code ORDER BY id;
If you have a case where the data in comments could contain unsafe-for-XML characters (
>
,<
,&
), you should change this:FOR XML PATH('')), 1, 1, '')
To this more elaborate approach:
FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')
(Be sure to use the right destination data type,
varchar
ornvarchar
, and the right length, and prefix all string literals withN
if usingnvarchar
.)+1 I creadted a fiddle for that for a quick look http://sqlfiddle.com/#!3/e4ee5/2
Yep, this works like a charm. @MarlonRibunal SQL Fiddle's really shaping up!
@NickChammas - I am going to stick my neck out and say that the order is guaranteed using the `order by` in the sub query. This is building XML using `for xml` and that is *the* way to build XML using TSQL. Order of elements in an XML files is an important matter and can be relied upon. So if this technique does not guarantee order then XML support in TSQL is severely broken.
I've validated that the query will return results in the correct order regardless of the clustered index on the underlying table (even a clustered index on `row_num desc` must obey the `order by` as Mikael suggested). I'm going to remove comments suggesting otherwise now that the query contains the right `order by` and hope that @JonSeigel considers doing the same.
If you're allowed to use CLR in your environment, this is a tailor-made case for a user-defined aggregate.
In particular, this is probably the way to go if the source data is non-trivially large and/or you need to do this type of thing a lot in your application. I strongly suspect the query plan for Aaron's solution will not scale well as the input size grows. (I tried adding an index to the temp table, but that didn't help.)
This solution, like many other things, is a tradeoff:
- Politics/policy for even using CLR Integration in your, or your client's, environment.
- CLR function is likely faster, and will scale better given a real set of data.
- CLR function will be reusable in other queries, and you won't have to duplicate (and debug) a complex subquery every time you need to do this type of thing.
- Straight T-SQL is simpler than writing and managing a piece of external code.
- Perhaps you don't know how to program in C# or VB.
- etc.
EDIT: Well, I went to try to see if this actually was better, and it turns out the requirement that the comments be in a specific order is currently not possible to satisfy using an aggregate function. :(
See SqlUserDefinedAggregateAttribute.IsInvariantToOrder. Basically, what you need to do is
OVER(PARTITION BY customer_code ORDER BY row_num)
butORDER BY
is not supported in theOVER
clause when aggregating. I'm assuming adding this functionality to SQL Server opens a can of worms, because what would need to be changed in the execution plan is trivial. The aforementioned link says this is reserved for future use, so this could be implemented in the future (on 2005 you're probably out of luck, though).This could still be accomplished by packing and parsing the
row_num
value into the aggregated string, and then doing the sort within the CLR object... which seems pretty hackish.In any event, below is the code I used in case anyone else finds this useful even with the limitation. I'll leave the hacking part as an exercise for the reader. Note that I used AdventureWorks (2005) for test data.
Aggregate assembly:
using System; using System.IO; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; namespace MyCompany.SqlServer { [Serializable] [SqlUserDefinedAggregate ( Format.UserDefined, IsNullIfEmpty = false, IsInvariantToDuplicates = false, IsInvariantToNulls = true, IsInvariantToOrder = false, MaxByteSize = -1 )] public class StringConcatAggregate : IBinarySerialize { private string _accum; private bool _isEmpty; public void Init() { _accum = string.Empty; _isEmpty = true; } public void Accumulate(SqlString value) { if (!value.IsNull) { if (!_isEmpty) _accum += ' '; else _isEmpty = false; _accum += value.Value; } } public void Merge(StringConcatAggregate value) { Accumulate(value.Terminate()); } public SqlString Terminate() { return new SqlString(_accum); } public void Read(BinaryReader r) { this.Init(); _accum = r.ReadString(); _isEmpty = _accum.Length == 0; } public void Write(BinaryWriter w) { w.Write(_accum); } } }
T-SQL for testing (
CREATE ASSEMBLY
, andsp_configure
to enable CLR omitted):CREATE TABLE [dbo].[Comments] ( CustomerCode int NOT NULL, RowNum int NOT NULL, Comments nvarchar(25) NOT NULL ) INSERT INTO [dbo].[Comments](CustomerCode, RowNum, Comments) SELECT DENSE_RANK() OVER(ORDER BY FirstName), ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY ContactID), Phone FROM [AdventureWorks].[Person].[Contact] GO CREATE AGGREGATE [dbo].[StringConcatAggregate] ( @input nvarchar(MAX) ) RETURNS nvarchar(MAX) EXTERNAL NAME StringConcatAggregate.[MyCompany.SqlServer.StringConcatAggregate] GO SELECT CustomerCode, [dbo].[StringConcatAggregate](Comments) AS AllComments FROM [dbo].[Comments] GROUP BY CustomerCode
Here's a cursor-based solution that guarantees the order of the comments by
row_num
. (See my other answer for how the[dbo].[Comments]
table was populated.)SET NOCOUNT ON DECLARE cur CURSOR LOCAL FAST_FORWARD FOR SELECT CustomerCode, Comments FROM [dbo].[Comments] ORDER BY CustomerCode, RowNum DECLARE @curCustomerCode int DECLARE @lastCustomerCode int DECLARE @curComment nvarchar(25) DECLARE @comments nvarchar(MAX) DECLARE @results table ( CustomerCode int NOT NULL, AllComments nvarchar(MAX) NOT NULL ) OPEN cur FETCH NEXT FROM cur INTO @curCustomerCode, @curComment SET @lastCustomerCode = @curCustomerCode WHILE @@FETCH_STATUS = 0 BEGIN IF (@lastCustomerCode != @curCustomerCode) BEGIN INSERT INTO @results(CustomerCode, AllComments) VALUES(@lastCustomerCode, @comments) SET @lastCustomerCode = @curCustomerCode SET @comments = NULL END IF (@comments IS NULL) SET @comments = @curComment ELSE SET @comments = @comments + N' ' + @curComment FETCH NEXT FROM cur INTO @curCustomerCode, @curComment END IF (@comments IS NOT NULL) BEGIN INSERT INTO @results(CustomerCode, AllComments) VALUES(@curCustomerCode, @comments) END CLOSE cur DEALLOCATE cur SELECT * FROM @results
-- solution avoiding the cursor ... DECLARE @idMax INT DECLARE @idCtr INT DECLARE @comment VARCHAR(150) SELECT @idMax = MAX(id) FROM [dbo].[CustomerCodeWithSeparateComments] IF @idMax = 0 return DECLARE @OriginalTable AS Table ( [id] [int] NOT NULL, [row_num] [int] NULL, [customer_code] [varchar](50) NULL, [comment] [varchar](120) NULL ) DECLARE @FinalTable AS Table ( [id] [int] IDENTITY(1,1) NOT NULL, [customer_code] [varchar](50) NULL, [comment] [varchar](120) NULL ) INSERT INTO @FinalTable ([customer_code]) SELECT [customer_code] FROM [dbo].[CustomerCodeWithSeparateComments] GROUP BY [customer_code] INSERT INTO @OriginalTable ([id] ,[row_num] ,[customer_code] ,[comment]) SELECT [id] ,[row_num] ,[customer_code] ,[comment] FROM [dbo].[CustomerCodeWithSeparateComments] ORDER BY id, row_num SET @idCtr = 1 SET @comment = '' WHILE @idCtr < @idMax BEGIN SELECT @comment = @comment + ' ' + comment FROM @OriginalTable WHERE id = @idCtr UPDATE @FinalTable SET [comment] = @comment WHERE [id] = @idCtr SET @idCtr = @idCtr + 1 SET @comment = '' END SELECT @comment = @comment + ' ' + comment FROM @OriginalTable WHERE id = @idCtr UPDATE @FinalTable SET [comment] = @comment WHERE [id] = @idCtr SELECT * FROM @FinalTable
You haven't avoided a cursor. You've just called your cursor a while loop instead.
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
MarlonRibunal 8 years ago
+1 I creadted a fiddle for that for a quick look http://sqlfiddle.com/#!3/e4ee5/2