There’s nothin’ like a good SQL puzzle. I was doing some yardwork on Sunday, and for some reason the TSQL function RAND() popped into my head. I don’t know why – it just did. I’m not sure what the relationship is between wet leaves and RAND(), but there clearly is one.
In case you aren’t aware of what RAND() does, it returns a random decimal number between 0 and 1. I’ve used RAND() in the past to generate a random percent for various purposes (SELECT RAND() * 100), but I got to thinking – what if I don’t want numbers between 1 and 100? What if I want numbers between 1 and 10, or 38 and 92? And what if I want more than 1 number? What if I want 10? Or if I want to allow duplicates?
So when I got inside I warmed up the intertubes and started searching to see if anyone other than me had been hit with this streak of brilliance. Apparently, they had:
- Jonathan Kehayias posted a solution for picking random rows from a table back in November of 2008. Close, but not exactly what I’m looking for:
- It necessitates a table, and I don’t necessarily have one.
- It doesn’t allow for duplicates, and I wanted to allow for the possibility of returning duplicate rows, if desired.
- It doesn’t use RAND(), and remember I was fixated on RAND() for some reason.
- That said, it’s still a cool solution for returning random rows from an existing table.
- Itzik Ben-Gan wrote up a solution for the April 2008 edition of SQL Server Magazine. Again, a very cool approach, but a few things were also missing from his solution:
- It doesn’t allow for duplicates.
- It doesn’t use RAND().
- It’s written in Klingon (on purpose, I think).
- Brian Connoly wrote up a summary of random sampling techniques for the March 2004 edition of Microsoft SQL Server Professional. Very informative, but again it didn’t handle duplicate rows.
Not long into working on a solution, I discovered that the fundamental challenge with creating a set of random numbers using RAND() is that RAND() is only evaluated once per batch. So if I use the RAND() function in a query that returns, say, 10 rows, it will return the same value for each of the rows. What this meant is that I couldn’t use an entirely set-based approach to generating this number list. :-(
I dried my eyes and moved on….
OK….now how do we translate a random decimal between 0 and 1 into a number in a user-specified range? Ask me not why, but this part just clicked – we makes bucketses. We have one bucket for each number in the range. In order for our algorithm to be truly random, the buckets all have to be the same size. How do I do that? Well…… 1 divided by the count of numbers in my range, multiplied by the position of this number, should return the “upper limit” of a given bucket.
In English? Say you want numbers between 5 and 14 (inclusive). This is ten numbers, and in order to calculate the “upper boundary” of a given number, we would take 1.0000000/10, multiplied by the position of a number in the sequence. So 5’s upper boundary would be 1.0000000/10 * 1; 6’s upper boundary would be 1.00000000/10 *2…and so on.
Then we will create a loop, and work out RAND() again and again until we have all of our numbers. If duplicates aren’t allowed, we need to make sure that we don’t pick the same number twice. The finished sproc looks something (well..exactly) like this:
CREATE PROCEDURE GenerateRandomNumbers
@AllowDuplicates bit --0 = No, 1 = Yes (Of Course)
--Make sure that the input params are valid.
--@StartNumber must be less than @EndNumber
IF @StartNumber >= @EndNumber
PRINT 'You turkey! @StartNumber has to be LESS THAN @EndNumber!'
--@EndNumber - @StartNumber must be >= @QuantityToOutput IF @AllowDuplicates = 0
IF @AllowDuplicates = 0 AND @EndNumber - @StartNumber <= @QuantityToOutput
PRINT 'Not enough numbers in the range to satisfy your OUTPUT and DUPLICATE parameter settings!'
--We're good to go if we get this far.
--Create a table to hold the "boundary points" of each number.
CREATE TABLE #NumberBoundaries
UpperLimit decimal (15,10)
INSERT #NumberBoundaries(Number, UpperLimit)
--The line below implements the "end point" math I spoke about earlier.
(1.0000000/(@EndNumber + 1 - @StartNumber) * (sv.Number + 1 - @StartNumber)) AS UpperLimit
FROM master..spt_values sv
WHERE sv.Number <= @EndNumber
AND sv.Number >= @StartNumber
AND sv.type = 'P'
--This holds the generated numbers until we have enough to return a result set
CREATE TABLE #NumbersToOutput
Number tinyint NOT NULL
--Loop until we're dizzy
WHILE (SELECT COUNT(*) FROM #NumbersToOutput) < @QuantityToOutput
--This CTE fetches the bucket in which a given execution of RAND() falls.
WITH MyNumber (Number)
AS (SELECT TOP 1 nb.Number
FROM #NumberBoundaries nb
WHERE RAND() < nb.UpperLimit
ORDER BY nb.Number ASC)
FROM MyNumber mn
--The line below allows us to either permit duplicates or check for them.
WHERE @AllowDuplicates = 1
mn.Number NOT IN (SELECT Number FROM #NumbersToOutput)
--Return our numbers
ORDER BY 1
Check it out for yourself!
As a final note – I’m sure more than one of you are asking yourselves why I called this article Check Your Lucky Numbers. Originally, when the idea first hit me, I thought it would be neat to build a random lottery number generator. Then I saw that others had already done this, and had done it more elegantly than I could by using RAND(). But I really liked the title, so I wasn’t about to give it up. Technically, the proc above can be used to generate lotto numbers as well. For instance, the most popular lottery where I live is called “6/49” – basically, you select 6 numbers between 1 and 49. If your numbers are drawn, you become unimaginably rich. Not PowerBall rich – there’s no keeping up with those crazy Yankees – but very rich nonetheless. If you call the proc above with the following params, it will return a random set of lotto 6/49 numbers:
EXEC GenerateRandomNumbers 1, 49, 6, 0
Of course, all of the standard warnings apply: the number generator is not guaranteed to produce winning numbers/know your limit & play within it/if you win using my numbers generator it would only be right to give me half/feel free to send me money even if you don’t win…… ;-)