When I was first introduced to Transact-SQL, I used to read scripts that other developers had written, in order to increase my exposure to the language and “learn their tricks”. I could write an entire post on the topic of learning from others, and it’s pros and cons, but that is a subject for another day.
One day I read a line in a script that looked something like this:
SELECT COUNT(1) FROM SomeTable
I asked the developer why he used “SELECT COUNT(1)” instead of “SELECT COUNT(*)” – his response was that the former would use an index, whereas the latter required a full table scan. Being brand new to T-SQL, the terms “index” and “table scan” were just enough to cause my eyes to glaze over, a fact that I quickly covered up with a knowing nod and a suitable grunt.
I found myself answering a question on the MSDN SQL Forums yesterday (first day on the new forums platform – congratulations migration team!) in which part of the script I provided was a “SELECT COUNT(1)”. Before submitting the post, I added a comment at the bottom, telling the OP that, although I had used “SELECT COUNT(1)”, it was equivalent to “SELECT COUNT(*)” or “SELECT COUNT(ColumnName)”. After hitting post, I stopped and stared for a moment – then the little voice kicked in.
Little Voice – What have you done??? You’ve broken the unspoken rule! It’s unspoken for a reason you dimwit! You’ve ruined us all!
Aaron – But, but…it seems so wrong! How could the optimizer be able to figure out correlated subqueries and nested joins wrapped in CTE’s, and not be able to figure out that the row count for every column in a table would be the same?
(Conversation with little voice abridged due to post length limitations)
So today I set about to prove the little voice wrong. I brainstormed a few tests and resolved to check each of their SHOWPLAN_XML output to see whether or not there were any differences. The tests were:
- SELECT COUNT(1) with no WHERE clause
- SELECT COUNT(*) with no WHERE clause
- SELECT COUNT(NonIndexedColumnName) with no WHERE clause
- SELECT COUNT(SmallIndexedColumnName) with no WHERE clause
- SELECT COUNT(LargeIndexedColumnName) with no WHERE clause
- SELECT COUNT(NonIndexedColumn) with a sargeable WHERE clause based upon a small-column index
WHERE ProductID = 750
- SELECT COUNT(NonIndexedColumn) with a non-sargeable WHERE clause based upon a large-column index
WHERE rowguid IN
- SELECT COUNT(1) with a non-sargeable WHERE clause
WHERE UnitPrice < 10
And the results were just as I had hoped (take that little voice!!!). In each case, SQL Server created the most optimal query plan to retrieve the desired information.
- Each of the queries without a WHERE clause scanned the smallest non-clustered index in the table (in this case, the index on ProductID), regardless of whether they were counting 1, *, a non-indexed column, or an indexed column. The scan took place on the smallest non-clustered index because doing so requires dealing with fewer bytes of data. This index was used even when counting a column used in another index (rowguid).
- The query which employed a sargeable WHERE clause against ProductID performed an index seek on the ProductID index.
- The query which employed a sargeable WHERE clause against rowguid performed an index seek on the rowguid index.
- The query which employed a non-sargeable WHERE clause against a non-indexed column (unforunately) did a table scan against the clustered index.
So there we have it – no black magic, no bookmark lookups or table scans (except when unavoidable), and no more ransom payments to the little voice. Feel free to use COUNT(*), COUNT(1), or COUNT(‘Aaron’). Whatever floats your boat.