In this post, I will show how to select random records using the TABLESAMPLE clause and use it in different ways. The TABLESAMPLE clause takes a parameter that can be a percent or a sample number representing rows. This clause samples a percentage of
pages randomly. You will get a different result set each time you run a query with the TABLESAMPLE clause. That means that even if you specify to return 50 rows, you will only get an approximate number of rows, that can be less or more than 50 due to the random sampling logic.
Select X% of Records
Returns different set of rows each time the query is executed
USE Northwind
GO
SELECT * FROM Customers
TABLESAMPLE (50 PERCENT);
Select X% of Records with same random sampling multiple times
Specify REPEATABLE with seed. REPEATABLE clause returns same set of rows every time that it is executed
USE Northwind
GO
SELECT * FROM Customers
TABLESAMPLE (50 PERCENT)
REPEATABLE (100)
Specify Number of Records instead of %(Percentage)
USE Northwind
GO
SELECT * FROM Customers
TABLESAMPLE (50 ROWS)
Important Points about TABLESAMPLE:
1. If you have upgraded from SQL 2000 to SQL Server 2005, make sure that the compatibility level of the database is set to at least 90.
2. TABLESAMPLE cannot be used on derived tables, OPENXML, Inline Table-Valued functions, linked server tables or a View.
Read more on TABLESAMPLE over here http://msdn.microsoft.com/en-us/library/ms189108(SQL.90).aspx
3 comments:
how can you pull random records out but not duplicate any? Meaning,I have records that can have the same name but different dates or states such as
John Smith 1/2/2010 DE
John Smith 5/14/2008 NJ
how can I only show one name within my random so my users don't get confused?
IGotYourDotnet,
One possible methods is
select name,min(date) as date from table
tablesample (50 percent)
group by name
Also note that tablesample doesn't work well for small set of data
If all data of table stored in a single page, either it returns all rows or no rows
Run this to understand
SELECT
city,min(customerid) FROM Northwind..Customers
group by city It returns 69 rows
Now
SELECT city,min(customerid) FROM Northwind..Customers
TABLESAMPLE (50 percent)
group by city
will return all 69 rows or no rows
So better approach is to use top operator with newid()
SELECT
top 50 percent city,min(customerid) FROM Northwind..Customers
group by city
order by newid()
Excellent tip. I vote for the top operator with newid()
Post a Comment