Select Random Records From a Table using TableSample in SQL Server 2005/2008

In my previous post, I had explained how to use the NEWID() to Delete Random Records From a Table Using SQL Server 2005/2008

In this post, I will show how to select random records using the TABLESAMPLE clause and use it in different ways. The TABLESAMPLE clause takes a parameter that can be a percent or a sample number representing rows. This clause samples a percentage of
pages randomly. You will get a different result set each time you run a query with the TABLESAMPLE clause. That means that even if you specify to return 50 rows, you will only get an approximate number of rows, that can be less or more than 50 due to the random sampling logic.

Select X% of Records

Returns different set of rows each time the query is executed


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 PERCENT);




Select X% of Records with same random sampling multiple times

Specify REPEATABLE with seed. REPEATABLE clause returns same set of rows every time that it is executed


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 PERCENT)


REPEATABLE (100)




Specify Number of Records instead of %(Percentage)


USE Northwind


GO


SELECT * FROM Customers


TABLESAMPLE (50 ROWS)




Important Points about TABLESAMPLE:

1. If you have upgraded from SQL 2000 to SQL Server 2005, make sure that the compatibility level of the database is set to at least 90.

2. TABLESAMPLE cannot be used on derived tables, OPENXML, Inline Table-Valued functions, linked server tables or a View.


Read more on TABLESAMPLE over here http://msdn.microsoft.com/en-us/library/ms189108(SQL.90).aspx

3 comments:

  1. how can you pull random records out but not duplicate any? Meaning,I have records that can have the same name but different dates or states such as
    John Smith 1/2/2010 DE
    John Smith 5/14/2008 NJ

    how can I only show one name within my random so my users don't get confused?

    ReplyDelete
  2. IGotYourDotnet,

    One possible methods is

    select name,min(date) as date from table
    tablesample (50 percent)
    group by name

    Also note that tablesample doesn't work well for small set of data
    If all data of table stored in a single page, either it returns all rows or no rows

    Run this to understand

    SELECT
    city,min(customerid) FROM Northwind..Customers
    group by city It returns 69 rows

    Now

    SELECT city,min(customerid) FROM Northwind..Customers
    TABLESAMPLE (50 percent)
    group by city

    will return all 69 rows or no rows

    So better approach is to use top operator with newid()

    SELECT
    top 50 percent city,min(customerid) FROM Northwind..Customers
    group by city
    order by newid()

    ReplyDelete
  3. Excellent tip. I vote for the top operator with newid()

    ReplyDelete