Dealing with Comma Delimited Strings
By Gregory A. Larsen
When dealing with data you come across many different situations. In this article I will discuss how to deal with a few situations that involve working with comma separated data. Comma separated data can come in many forms. It can be input, a text string stored in a column, or a number of other situations. This article will deal with two different comma separated data situations.
Displaying Multiple Records from a Single Record
In this situation there is a column in a table that holds a series of values that are separated by a comma. For each record, the comma separated column needs to be parsed apart and returned as separate row. So the final output record set will contain multiple records for a given single record stored in a SQL Server table.
To demonstrate this I will run the following code:
set nocount on
-- Create Example1 Table and Populate with Data
create table Example1 (Id int,
TypeOfValues varchar(20),
ColumnOfValues char(30))
CREATE UNIQUE CLUSTERED INDEX ID_ind
ON Example1(Id)
WITH IGNORE_DUP_KEY
insert into Example1 values(1, 'Colors','Red,Green,Blue,Black,White')
insert into Example1 values(2, 'Models','Normal,Deluxe,Super Deluxe')
insert into Example1 values(3, 'Years','2004,2005,2006')
-- Create Number Table
SELECT IDENTITY(INT) AS Number
INTO Numbers
FROM sysobjects s1
CROSS JOIN sysobjects s2
CREATE UNIQUE CLUSTERED INDEX Number_ind
ON Numbers(number)
WITH IGNORE_DUP_KEY
SELECT Id,
TypeOfValues,
SUBSTRING( ColumnOfValues, Number,
CHARINDEX( ',', ColumnofValues + ',', Number ) - Number ) as Value
FROM Example1
INNER JOIN Numbers
ON SUBSTRING( ',' + ColumnOfValues, Number, 1 ) = ','
where Number <= Len(ColumnOfValues) + 1
drop table Example1
drop table Numbers
When I run this code on my server I get the following results:
Id TypeOfValues Value
----------- -------------------- ------------------------------
1 Colors Red
1 Colors Green
1 Colors Blue
1 Colors Black
1 Colors White
2 Models Normal
2 Models Deluxe
2 Models Super Deluxe
3 Years 2004
3 Years 2005
3 Years 2006
Here you can see that for each “Id” there are multiple rows. Each row has only one value from the comma delimited column “ColumnOfValues”. To understand how this was accomplished let’s review my code.
First I create the table “Example1”, and then populate it with three different records. Each record contains a type column (“TypeOfValues”) and a value column (“ColumnOfValues”) which contains a comma delimited string of with different values. The “ColumnOfValues” column will be the column that that is parsed apart to create multiple records for each record in the Example1 table.
Next I create a Numbers table, by joining sysobjects to itself. This table will contain a series of sequential numbers starting from 1.
Finally the SELECT statement parses apart the “ColumnOfValue” column into multiple records. It does this by using the Numbers table to identify the offset of each comma. This is done by joining the Numbers table to a single character substring of the “ColumnOfValues” column, starting with the first character, then second, and so on. Whenever the join condition finds a comma, it uses the Number value to identify the starting point of the SUBSTRING and CHARINDEX functions, so these functions can extract a single character string of value out of the “ColumnOfValues” columns. To increase the performance of this statement a WHERE clause is added to reduce the number of rows from the Numbers table that needs to be joined to the Example1 table.
Displaying a Single Record with a Comma Separated Column from Multiple Records
Some times you might have a table that contains a series of records that contain a key and a value column. In your table there might be many different values for a given key. This example will show you how to collapse all those key value pairs into a single record. That single record will contain a unique key followed by a comma separate string composed of all the values associated with the key.
Here is an example of the table I will be using that contains a key (id_no) and a value (item):
id_no item
----------- --------------------
1 Skiing
1 Diving
2 Diving
2 Skiing
2 Hunting
2 Fishing
4 Sailing
4 Skiing
5 Skiing
In this table for each “id_no” there is one or more “items” identified. Each record contains a single “item” value. I will use the code listed below to populate the above table:
-- create example table
CREATE TABLE Example2(id_no int not null, item varchar(20) not null)
-- populate the example table
INSERT INTO Example2 VALUES (1, 'Skiing')
INSERT INTO Example2 VALUES (1, 'Diving')
INSERT INTO Example2 VALUES (2, 'Diving')
INSERT INTO Example2 VALUES (2, 'Skiing')
INSERT INTO Example2 VALUES (2, 'Hunting')
INSERT INTO Example2 VALUES (2, 'Fishing')
INSERT INTO Example2 VALUES (4, 'Sailing')
INSERT INTO Example2 VALUES (4, 'Skiing')
INSERT INTO Example2 VALUES (5, 'Skiing')
The next code snippet returns a record set that contains a single record for each “id_no”, followed by a comma delimited string that concatenates each “item” value together into a single column value:
-- declare local variables
declare @p varchar(1000)
declare @i char(5)
declare @sm int
declare @m int
-- Print Report Heading
print 'id_no' + ' items'
print '----- ' + '------------------------------------------'
set @p = ''
-- set @m to the first id number
select top 1 @m = id_no from Example2
order by id_no
set @sm = 0
-- Process each id_no until no more items
while @m <> @sm
begin
set @sm = @m
-- string together all items with a comma between
select @i = id_no, @p = case when @p = '' then item else @p + ', ' + item end
from Example2 a
where id_no = @m
-- print id_no, and comma delimited string
print @i + ' ' + @p
-- increment id number
select top 1 @m = id_no from Example2
where id_no > @sm
order by id_no
set @p = ''
end
-- remove example table
drop table Example2
When I run this code against my Example2 table I get the following output:
id_no items
----- ------------------------------------------
1 Skiing, Diving
2 Diving, Skiing, Hunting, Fishing
4 Sailing, Skiing
5 Skiing
Let me explain how this code works. This code iteratively process each “id_no” value using a WHILE loop. Each pass through the WHILE loop strings together all the “item” values for a given “id_no”. The variable @m contains the value of the “id_no” for the records being collapsed into a single record. The following SELECT statement does all the work to collapse all the records for a given “id_no” value into a single row in the output:
select @i = id_no, @p = case when @p = '' then item else @p + ', ' + item end
from Example2 a
where id_no = @m
This code concatenates a comma with the value of the “item” column and adds it to the variable @p. This method allows you a way to summarize a character string, in this case the value of the “item” column followed by a comma. After this command has completed execution the variable @i contains the “id_no”, and the @p variable contains a comma delimited string of “item” column values for the give “id_no”.
The PRINT statement is used to display each row of comma delimited values for a given “id_no”. The last SELECT statement in the WHILE loop set the @m variable to the next “id_no” to be processed. This WHILE loop continues to creating comma delimited strings for each “item” column processing one “id_no” at a time until all “id_no” records have been processed.
Conclusion
This article showed you only two examples of how to deal with comma separated data. One example showed you how to break apart comma separated data, where as the other one showed you how to join multiple records into a single record where the data was separated by commas. Hopefully next time you have to deal with comma separated data these examples will give you a jump start on writing your T-SQL code to work with comma separated data.
Posted in | Edit |
0 comments:
Post a Comment