Developing Matt

My Technical Journal

Breaking up is hard to do

leave a comment »

There are a lot of great posts about splitting a table into a list or splitting a list into a table.  Do a search and you will be a little overwhelmed by the good and the bad advice.  Let me turn you on to a couple of excellent articles.  Anith wrote an excellent must read article titled Faking Arrays where he explores the various  sundry ways to do this.  Itzik Ben-Gan (my performance hero) has spent a lot of time determining the most efficient way to split data in a whitepaper titled Arrays and Lists in Sql Server 2005 (it still applies in 2008). 

After gleaning over these articles and countless others I’ve surmised the following four methods. 

1.

The first is splitting your table to a character list.  It’s pretty simple, really.  Just declare your variable and set it to an empty string.  Setting it to an empty string is required unless you want to mess with the ‘SET CONCAT_NULL_YIELDS_NULL’ setting which I don’t recommend…some settings actually modify the database and cause your stats to restart…not sure about all this particular setting though and I’m not really up on this little fact.  Why is it required?  Well, you can’t add a string to a null value, can you.  1 + null just doesn’t make any sense and will always return you a null (unless you mess with the setting above of course.  Anyway, this is how you do it.

declare @d varchar(100)=” 
select @d = @d + ‘ your delimeter ‘ +  field from yourtable
print @d

2. 

The second method is splitting your table into a character list utilizing the new 2005 ‘for xml’ command.  From what I understand this performs better than the first, but it’s not as readable:

select yourfield= STUFF(
             (SELECT  ‘ ‘ + yourfield AS [text()]
              from yourtable 
              for xml PATH(”)),1,1,’,’)

3.

The third method we broach the discussion of splitting a list into a table.  The most efficient method of doing this is using a custom clr.  Text-parsing isn’t sql server’s forte, but it is for iterative-based languages like vb, c#, etc.  But this isn’t the way I do it because it requires modifying the database to allow clr operations and this means getting permission and jumping through hoops, etc.  So the next best method is using a numbers table with a table-valued function that joins to this table.  This is Itzik-tested because it comes from Itzik.  As far as creating a creating a numbers table, don’t waste your time searching, just use this:

CREATE TABLE [dbo].[Numbers](
    [Number] [int] NOT NULL
)
declare @fromNumber int =0, @toNumber int=100,
@byStep int=1

;

WITH CTE_NumbersTable AS (
SELECT @fromNumber AS i
UNION ALL
SELECT i + @byStep
FROM CTE_NumbersTable
WHERE
(i + @byStep) <= @toNumber
)
insert into numbers SELECT * FROM CTE_NumbersTable

Now that this is done here is the function that can be used to split a list into a table (which utilizes the numbers table):

CREATE FUNCTION [dbo].[split_to_table](@array AS VARCHAR(1000), @delimiter as varchar(6)) RETURNS TABLE
AS

RETURN
  SELECT
      ltrim(rtrim(SUBSTRING(@array, number,
      CHARINDEX(@delimiter, @array + @delimiter, number) – number))) AS element
  FROM dbo.Numbers
  WHERE number <= LEN(@array) AND SUBSTRING(@delimiter + @array, number, 1) = @delimiter

4.

Method number 4 applies when you don’t have access to create a numbers table.  Lots of ways to do this and this is one of them.  This was put together with the first article I mentioned by Anith.  I couldn’t get the max recursion option to work because I was in sql 2000 but I left it in.

CREATE FUNCTION [dbo].[Split]
(
@p VARCHAR(500), @delimeter varchar(3)
)
–select * from split(”,’|’)
RETURNS @tblSplit TABLE (value varchar(100))
AS
BEGIN
IF ISNULL(@P,”) <> ”
    BEGIN
        WITH CTE ( pos, pos_begin, pos_end ) AS (
                SELECT 0, 1, CHARINDEX( @delimeter, @p + ‘,’ )
                 UNION ALL
                SELECT pos + 1, pos_end + 1, CHARINDEX( @delimeter, @p + @delimeter, pos_end + 1 )
                  FROM CTE
                 WHERE CHARINDEX( @delimeter, @p + @delimeter, pos_end + 1 ) > 0 )
            insert @tblSplit SELECT  SUBSTRING( @p, pos_begin , pos_end – pos_begin ) AS "value"
              FROM CTE
            –OPTION ( MAXRECURSION 0 ) ;
    END
RETURN
END

 

Lots of ways to do this….these are my top four.  If you have a better way let me know.

Advertisements

Written by matt

June 23, 2009 at 11:39 am

Posted in T-Sql

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: