Recently I came across a cool query by Peter where he displayed how to do it based on a category. I have modified the query to show you how random records can be displayed from a table, partitioned on a category. We will also see how to delete these records.
SAMPLE DATA
DECLARE @ErrorLog TABLE
(
ErrorID INT,
ErrorCategory INT,
ErrorDate DATETIME
)
INSERT @ErrorLog
SELECT 101, 1, '20080217' UNION ALL
SELECT 102, 1, '20080217' UNION ALL
SELECT 103, 1, '20080217' UNION ALL
SELECT 104, 1, '20080217' UNION ALL
SELECT 105, 1, '20080217' UNION ALL
SELECT 106, 1, '20080217' UNION ALL
SELECT 107, 2, '20080217' UNION ALL
SELECT 108, 2, '20080217' UNION ALL
SELECT 109, 1, '20080217' UNION ALL
SELECT 110, 2, '20080217' UNION ALL
SELECT 111, 2, '20080217' UNION ALL
SELECT 112, 1, '20080217'
If you observe, there are 8 records of ErrorCategory 1 and 4 records of ErrorCategory 2. In order to display 50% random records from both Error Category 1 and 2, use this query
QUERY to Select Random Records based on a category
SELECT *
FROM
(
SELECT
ErrorID, ErrorCategory,
ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,
COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID
FROM @ErrorLog
) AS TruncTbl
WHERE .50 * maxID >= recID
RESULTS
ErrorID ErrorCategory recID maxID
102 1 1 8
106 1 2 8
104 1 3 8
103 1 4 8
108 2 1 4
107 2 2 4
Similarly, now if you have to delete random records in a table based on a category, here's how to do so
QUERY to Delete Random Records from a table based on a category
DELETE TruncTbl
FROM
(
SELECT
ErrorID,
ROW_NUMBER() OVER (PARTITION BY ErrorCategory ORDER BY NEWID()) AS recID,
COUNT(*) OVER (PARTITION BY ErrorCategory) AS maxID
FROM @ErrorLog
) AS TruncTbl
WHERE .50 * maxID >= recID
RESULTS
ErrorID ErrorCategory ErrorDate
103 1 2008-02-17 00:00:00.000
104 1 2008-02-17 00:00:00.000
108 2 2008-02-17 00:00:00.000
109 1 2008-02-17 00:00:00.000
111 2 2008-02-17 00:00:00.000
112 1 2008-02-17 00:00:00.000
Observe that Both ErrorCategory 1 and 2 had even set of records (8 and 4 each). If there are odd number of records, the results will differ.
Note: SQL Server 2005/2008 also has the TABLESAMPLE clause that can be used to do random sampling. I will explain this feature in my next post
No comments:
Post a Comment