Wednesday, September 17, 2008
I love hockey.
This post should be equally enjoyed by both the SQL nerd and sports fanatic.... and possibly Sidney Crosby's mom. I've included an example of how to calculate percentiles using SQL Server 2005.
The example starts out by populating a table variable with the salaries of the 2008-09 Pittsburgh Penguins!!! Thank God Taffe is in that list....
Since percentiles are a method of ranking data, first we must rank each value in the dataset. This is accomplished in the "SALARIES_RANKED" CTE by using the ROW_NUMBER() function. After ranking the values, we can determine any Nth percentile by using the following formula.
SalaryRank - 1 = ROUND((SalaryCount - 1) * N/100, 0)
The above query produces the following results
If we want to determine what percentile each players salary lies within, we can use the following query. Unlike the first example, this query uses the RANK() function instead of the ROW_NUMBER() function. This will assign the same rank to records containing the same value.
Here is each player's percentile rank...
The example starts out by populating a table variable with the salaries of the 2008-09 Pittsburgh Penguins!!! Thank God Taffe is in that list....
Since percentiles are a method of ranking data, first we must rank each value in the dataset. This is accomplished in the "SALARIES_RANKED" CTE by using the ROW_NUMBER() function. After ranking the values, we can determine any Nth percentile by using the following formula.
SalaryRank - 1 = ROUND((SalaryCount - 1) * N/100, 0)
DECLARE @Salaries TABLE(FullName VARCHAR(30), Salary MONEY)
INSERT INTO @Salaries VALUES('Crosby, Sidney',8700000)
INSERT INTO @Salaries VALUES('Gonchar, Sergei',5000000)
INSERT INTO @Salaries VALUES('Fleury, Marc-Andre',5000000)
INSERT INTO @Salaries VALUES('Whitney, Ryan',4000000)
INSERT INTO @Salaries VALUES('Malkin, Evgeni',3834000)
INSERT INTO @Salaries VALUES('Orpik, Brooks',3750000)
INSERT INTO @Salaries VALUES('Satan, Miroslav',3500000)
INSERT INTO @Salaries VALUES('Sykora, Petr',2625000)
INSERT INTO @Salaries VALUES('Sydor, Darryl',2500000)
INSERT INTO @Salaries VALUES('Fedotenko, Ruslan',2250000)
INSERT INTO @Salaries VALUES('Staal, Jordan',2200000)
INSERT INTO @Salaries VALUES('Gill, Hal',2075000)
INSERT INTO @Salaries VALUES('Eaton, Mark',2000000)
INSERT INTO @Salaries VALUES('Dupuis, Pascal',1400000)
INSERT INTO @Salaries VALUES('Cooke, Matt',1200000)
INSERT INTO @Salaries VALUES('Goligoski, Alex',984000)
INSERT INTO @Salaries VALUES('Letang, Kris',835000)
INSERT INTO @Salaries VALUES('Godard, Eric',750000)
INSERT INTO @Salaries VALUES('Scuderi, Rob',713000)
INSERT INTO @Salaries VALUES('Talbot, Maxime',675000)
INSERT INTO @Salaries VALUES('Beech, Kris',585000)
INSERT INTO @Salaries VALUES('Taffe, Jeff',550000)
INSERT INTO @Salaries VALUES('Kennedy, Tyler',542000)
INSERT INTO @Salaries VALUES('Sabourin, Dany',512000)
INSERT INTO @Salaries VALUES('Minard, Chris',500000);
WITH SALARIES_RANKED AS
(
SELECT FullName
,Salary
,ROW_NUMBER() OVER(ORDER BY Salary) AS SalaryRank
,COUNT(*) OVER(PARTITION BY NULL) AS SalaryCount
FROM @Salaries
)
SELECT '25th' AS Percentile
,Salary AS Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .25, 0)
UNION
SELECT 'Median'
,Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .50, 0)
UNION
SELECT '75th'
,Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .75, 0)
ORDER BY Salary;
The above query produces the following results
Percentile Salary
---------- ---------------------
25th 713000.00
Median 2000000.00
75th 3500000.00
If we want to determine what percentile each players salary lies within, we can use the following query. Unlike the first example, this query uses the RANK() function instead of the ROW_NUMBER() function. This will assign the same rank to records containing the same value.
WITH SALARIES_RANKED AS
(
SELECT FullName
,Salary
,RANK() OVER(ORDER BY Salary) AS SalaryRank
,COUNT(*) OVER(PARTITION BY NULL) AS SalaryCount
FROM @Salaries
)
SELECT FullName
,Salary
,CAST(ROUND((SalaryRank - 1)/CAST(SalaryCount - 1 AS DECIMAL(10,2)) * 100, 0) AS INT) AS Percentile
FROM SALARIES_RANKED
ORDER BY Percentile DESC
Here is each player's percentile rank...
FullName Salary Percentile
------------------------------ --------------------- -----------
Crosby, Sidney 8700000.00 100
Gonchar, Sergei 5000000.00 92
Fleury, Marc-Andre 5000000.00 92
Whitney, Ryan 4000000.00 88
Malkin, Evgeni 3834000.00 83
Orpik, Brooks 3750000.00 79
Satan, Miroslav 3500000.00 75
Sykora, Petr 2625000.00 71
Sydor, Darryl 2500000.00 67
Fedotenko, Ruslan 2250000.00 63
Staal, Jordan 2200000.00 58
Gill, Hal 2075000.00 54
Eaton, Mark 2000000.00 50
Dupuis, Pascal 1400000.00 46
Cooke, Matt 1200000.00 42
Goligoski, Alex 984000.00 38
Letang, Kris 835000.00 33
Godard, Eric 750000.00 29
Scuderi, Rob 713000.00 25
Talbot, Maxime 675000.00 21
Beech, Kris 585000.00 17
Taffe, Jeff 550000.00 13
Kennedy, Tyler 542000.00 8
Sabourin, Dany 512000.00 4
Minard, Chris 500000.00 0
Subscribe to Posts [Atom]