Saturday, December 13, 2008

Debunking the Myth: SELECT COUNT(*) vs. SELECT COUNT(1)

When I was first introduced to Transact-SQL, I used to read scripts that other developers had written, in order to increase my exposure to the language and “learn their tricks”. I could write an entire post on the topic of learning from others, and it’s pros and cons, but that is a subject for another day.

One day I read a line in a script that looked something like this:

SELECT COUNT(1) FROM SomeTable

I asked the developer why he used “SELECT COUNT(1)” instead of “SELECT COUNT(*)” – his response was that the former would use an index, whereas the latter required a full table scan. Being brand new to T-SQL, the terms “index” and “table scan” were just enough to cause my eyes to glaze over, a fact that I quickly covered up with a knowing nod and a suitable grunt.

I found myself answering a question on the MSDN SQL Forums yesterday (first day on the new forums platform – congratulations migration team!) in which part of the script I provided was a “SELECT COUNT(1)”. Before submitting the post, I added a comment at the bottom, telling the OP that, although I had used “SELECT COUNT(1)”, it was equivalent to “SELECT COUNT(*)” or “SELECT COUNT(ColumnName)”. After hitting post, I stopped and stared for a moment – then the little voice kicked in.

Little Voice – What have you done??? You’ve broken the unspoken rule! It’s unspoken for a reason you dimwit! You’ve ruined us all!

Aaron – But, but…it seems so wrong! How could the optimizer be able to figure out correlated subqueries and nested joins wrapped in CTE’s, and not be able to figure out that the row count for every column in a table would be the same?

(Conversation with little voice abridged due to post length limitations)

So today I set about to prove the little voice wrong. I brainstormed a few tests and resolved to check each of their SHOWPLAN_XML output to see whether or not there were any differences. The tests were:

  • SELECT COUNT(1) with no WHERE clause
  • SELECT  COUNT(1)
    FROM Sales.SalesOrderDetail

  • SELECT COUNT(*) with no WHERE clause
  • SELECT  COUNT(*)
    FROM Sales.SalesOrderDetail

  • SELECT COUNT(NonIndexedColumnName) with no WHERE clause
  • SELECT  COUNT(UnitPrice)
    FROM Sales.SalesOrderDetail

  • SELECT COUNT(SmallIndexedColumnName) with no WHERE clause
  • SELECT  COUNT(ProductID)
    FROM Sales.SalesOrderDetail

  • SELECT COUNT(LargeIndexedColumnName) with no WHERE clause
  • SELECT  COUNT(rowguid)
    FROM Sales.SalesOrderDetail

  • SELECT COUNT(NonIndexedColumn) with a sargeable WHERE clause based upon a small-column index
  • SELECT  COUNT(UnitPrice)
    FROM Sales.SalesOrderDetail
    WHERE ProductID = 750

  • SELECT COUNT(NonIndexedColumn) with a non-sargeable WHERE clause based upon a large-column index
  • SELECT    COUNT(*)
    FROM Sales.SalesOrderDetail
    WHERE rowguid IN
    ('04C4DE91-5815-45D6-8670-F462719FBCE3',
    '5A74C7D2-E641-438E-A7AC-37BF23280301',
    'CE472532-A4C0-45BA-816E-EEFD3FD848B3',
    '80667840-F962-4EE3-96E0-AECA108E0D4F',
    'E9D54907-E7B7-4969-80D9-76BA69F8A836',
    'AA542630-BDCD-4CE5-89A0-C1BF82747725',
    'AC769034-3C2F-495C-A5A7-3B71CDB25D4E'
    )

  • SELECT COUNT(1) with a non-sargeable WHERE clause
  • SELECT    COUNT(1)
    FROM Sales.SalesOrderDetail
    WHERE UnitPrice < 10

And the results were just as I had hoped (take that little voice!!!). In each case, SQL Server created the most optimal query plan to retrieve the desired information.

  • Each of the queries without a WHERE clause scanned the smallest non-clustered index in the table (in this case, the index on ProductID), regardless of whether they were counting 1, *, a non-indexed column, or an indexed column. The scan took place on the smallest non-clustered index because doing so requires dealing with fewer bytes of data. This index was used even when counting a column used in another index (rowguid).
  • The query which employed a sargeable WHERE clause against ProductID performed an index seek on the ProductID index.
  • The query which employed a sargeable WHERE clause against rowguid performed an index seek on the rowguid index.
  • The query which employed a non-sargeable WHERE clause against a non-indexed column (unforunately) did a table scan against the clustered index.

So there we have it – no black magic, no bookmark lookups or table scans (except when unavoidable), and no more ransom payments to the little voice. Feel free to use COUNT(*), COUNT(1), or COUNT(‘Aaron’). Whatever floats your boat.

3 comments:

FistsOfTinsel said...

In your example

# SELECT COUNT(NonIndexedColumnName) with no WHERE clause

SELECT COUNT(UnitPrice)
FROM Sales.SalesOrderDetail

the column in question (UnitPrice) must be defined as NOT NULL. If the column allows NULL, it is not possible for the COUNT to be computed without visiting the data pages for the table in question, since COUNT(Column) only returns the count of rows that are non-null. In that case, you would have a full table scan, as that would be faster than traversing an index in order to reach the data pages.

Tigger said...

Aaron, which flavorn of SQL Server did you test this on?

Aaron Alton said...

@Tigger Thanks for the comment. I tested this on SQL Server 2008 RTM x64 (Developer Edition)

Why do you ask?