Tuesday, March 24, 2009

Check Your Lucky Numbers (Random Number Sequences in SQL Server TSQL)

There’s nothin’ like a good SQL puzzle. I was doing some yardwork on Sunday, and for some reason the TSQL function RAND() popped into my head. I don’t know why – it just did. I’m not sure what the relationship is between wet leaves and RAND(), but there clearly is one.

In case you aren’t aware of what RAND() does, it returns a random decimal number between 0 and 1. I’ve used RAND() in the past to generate a random percent for various purposes (SELECT RAND() * 100), but I got to thinking – what if I don’t want numbers between 1 and 100? What if I want numbers between 1 and 10, or 38 and 92? And what if I want more than 1 number? What if I want 10? Or if I want to allow duplicates?

So when I got inside I warmed up the intertubes and started searching to see if anyone other than me had been hit with this streak of brilliance. Apparently, they had:

  • Jonathan Kehayias posted a solution for picking random rows from a table back in November of 2008. Close, but not exactly what I’m looking for:
    • It necessitates a table, and I don’t necessarily have one.
    • It doesn’t allow for duplicates, and I wanted to allow for the possibility of returning duplicate rows, if desired.
    • It doesn’t use RAND(), and remember I was fixated on RAND() for some reason.
    • That said, it’s still a cool solution for returning random rows from an existing table.
  • Itzik Ben-Gan wrote up a solution for the April 2008 edition of SQL Server Magazine. Again, a very cool approach, but a few things were also missing from his solution:
    • It doesn’t allow for duplicates.
    • It doesn’t use RAND().
    • It’s written in Klingon (on purpose, I think).
  • Brian Connoly wrote up a summary of random sampling techniques for the March 2004 edition of Microsoft SQL Server Professional. Very informative, but again it didn’t handle duplicate rows.

Not long into working on a solution, I discovered that the fundamental challenge with creating a set of random numbers using RAND() is that RAND() is only evaluated once per batch. So if I use the RAND() function in a query that returns, say, 10 rows, it will return the same value for each of the rows. What this meant is that I couldn’t use an entirely set-based approach to generating this number list. :-(

I dried my eyes and moved on….

OK….now how do we translate a random decimal between 0 and 1 into a number in a user-specified range? Ask me not why, but this part just clicked – we makes bucketses. We have one bucket for each number in the range. In order for our algorithm to be truly random, the buckets all have to be the same size. How do I do that? Well…… 1 divided by the count of numbers in my range, multiplied by the position of this number, should return the “upper limit” of a given bucket.

In English? Say you want numbers between 5 and 14 (inclusive). This is ten numbers, and in order to calculate the “upper boundary” of a given number, we would take 1.0000000/10, multiplied by the position of a number in the sequence. So 5’s upper boundary would be 1.0000000/10 * 1; 6’s upper boundary would be 1.00000000/10 *2…and so on.

Then we will create a loop, and work out RAND() again and again until we have all of our numbers. If duplicates aren’t allowed, we need to make sure that we don’t pick the same number twice. The finished sproc looks something (well..exactly) like this:

CREATE PROCEDURE GenerateRandomNumbers

(

@StartNumber tinyint,

@EndNumber tinyint,

@QuantityToOutput tinyint,

@AllowDuplicates bit --0 = No, 1 = Yes (Of Course)

)

AS

BEGIN



--Make sure that the input params are valid.


--@StartNumber must be less than @EndNumber

IF @StartNumber >= @EndNumber

PRINT 'You turkey! @StartNumber has to be LESS THAN @EndNumber!'

RETURN


--@EndNumber - @StartNumber must be >= @QuantityToOutput IF @AllowDuplicates = 0

IF @AllowDuplicates = 0 AND @EndNumber - @StartNumber <= @QuantityToOutput

PRINT 'Not enough numbers in the range to satisfy your OUTPUT and DUPLICATE parameter settings!'

RETURN


--We're good to go if we get this far.

--Create a table to hold the "boundary points" of each number.

CREATE TABLE #NumberBoundaries

(

Number tinyint,

UpperLimit decimal (15,10)

)


INSERT #NumberBoundaries(Number, UpperLimit)

SELECT sv.number,
--The line below implements the "end point" math I spoke about earlier.

(1.0000000/(@EndNumber + 1 - @StartNumber) * (sv.Number + 1 - @StartNumber)) AS UpperLimit

FROM master..spt_values sv

WHERE sv.Number <= @EndNumber

AND sv.Number >= @StartNumber

AND sv.type = 'P'


--This holds the generated numbers until we have enough to return a result set

CREATE TABLE #NumbersToOutput

(

Number tinyint NOT NULL

)


--Loop until we're dizzy

WHILE (SELECT COUNT(*) FROM #NumbersToOutput) < @QuantityToOutput

BEGIN

--This CTE fetches the bucket in which a given execution of RAND() falls.

WITH MyNumber (Number)

AS (SELECT TOP 1 nb.Number

FROM #NumberBoundaries nb

WHERE RAND() < nb.UpperLimit

ORDER BY nb.Number ASC)


INSERT #NumbersToOutput

SELECT mn.Number

FROM MyNumber mn

--The line below allows us to either permit duplicates or check for them.

WHERE @AllowDuplicates = 1

OR

mn.Number NOT IN (SELECT Number FROM #NumbersToOutput)


END



--Return our numbers

SELECT Number

FROM #NumbersToOutput

ORDER BY 1

END

Check it out for yourself!



As a final note – I’m sure more than one of you are asking yourselves why I called this article Check Your Lucky Numbers. Originally, when the idea first hit me, I thought it would be neat to build a random lottery number generator. Then I saw that others had already done this, and had done it more elegantly than I could by using RAND(). But I really liked the title, so I wasn’t about to give it up. Technically, the proc above can be used to generate lotto numbers as well. For instance, the most popular lottery where I live is called “6/49” – basically, you select 6 numbers between 1 and 49. If your numbers are drawn, you become unimaginably rich. Not PowerBall rich – there’s no keeping up with those crazy Yankees – but very rich nonetheless. If you call the proc above with the following params, it will return a random set of lotto 6/49 numbers:

EXEC GenerateRandomNumbers 1, 49, 6, 0

Of course, all of the standard warnings apply: the number generator is not guaranteed to produce winning numbers/know your limit & play within it/if you win using my numbers generator it would only be right to give me half/feel free to send me money even if you don’t win…… ;-)


Have fun!

5 comments:

SQL said...

Aaron, take a look at SQL Server - Set based random numbers written by my good friend George

CyberBhai said...

I have a set of numbers, my need is to pick a no. from that set randomly.

for example : I may have 40, 39, 21, 33, 46, 93, 97, 51 as my set. Each instance should give me a no. that was not select earlier (may not be in 4/5 earlier attempts)

aileen said...

I recently came accross your blog and have been reading along. I thought I would leave my first comment. I dont know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.


Susan

http://dclottery.info

هاني العولقي said...

it is very nice ..
I want to ask
how to use output of this stored procedure to Update field in another table?

Price Cheaperton said...

Hi, i am using SQL SERVER 2008 and i when i execute the stored proc it just returns value 1. I also enter the values the same as you did. Any ideas? Props for writing this! It will come in handy when i win the Lotto ! ;)