Wednesday, September 17, 2008

I still love hockey.... and Oracle!

As a follow up to my last post, here is an extremely simple way to show each player's salary percentile ranking using Oracle...

A table named SALARIES contains the player's salaries:
FullName                       Salary
------------------------------ ---------------------
Crosby, Sidney 8700000.00
Gonchar, Sergei 5000000.00
Fleury, Marc-Andre 5000000.00
Whitney, Ryan 4000000.00
Malkin, Evgeni 3834000.00
Orpik, Brooks 3750000.00
Satan, Miroslav 3500000.00
Sykora, Petr 2625000.00
Sydor, Darryl 2500000.00
Fedotenko, Ruslan 2250000.00
Staal, Jordan 2200000.00
Gill, Hal 2075000.00
Eaton, Mark 2000000.00
Dupuis, Pascal 1400000.00
Cooke, Matt 1200000.00
Goligoski, Alex 984000.00
Letang, Kris 835000.00
Godard, Eric 750000.00
Scuderi, Rob 713000.00
Talbot, Maxime 675000.00
Beech, Kris 585000.00
Taffe, Jeff 550000.00
Kennedy, Tyler 542000.00
Sabourin, Dany 512000.00
Minard, Chris 500000.00

The following query will rank each salary...
SELECT FullName
,Salary
,ROUND(PERCENT_RANK() OVER(PARTITION BY NULL ORDER BY Salary), 2) * 100 Percentile
FROM SALARIES
ORDER BY Percentile DESC

...and the results are...
FullName                       Salary                Percentile
------------------------------ --------------------- -----------
Crosby, Sidney 8700000.00 100
Gonchar, Sergei 5000000.00 92
Fleury, Marc-Andre 5000000.00 92
Whitney, Ryan 4000000.00 88
Malkin, Evgeni 3834000.00 83
Orpik, Brooks 3750000.00 79
Satan, Miroslav 3500000.00 75
Sykora, Petr 2625000.00 71
Sydor, Darryl 2500000.00 67
Fedotenko, Ruslan 2250000.00 63
Staal, Jordan 2200000.00 58
Gill, Hal 2075000.00 54
Eaton, Mark 2000000.00 50
Dupuis, Pascal 1400000.00 46
Cooke, Matt 1200000.00 42
Goligoski, Alex 984000.00 38
Letang, Kris 835000.00 33
Godard, Eric 750000.00 29
Scuderi, Rob 713000.00 25
Talbot, Maxime 675000.00 21
Beech, Kris 585000.00 17
Taffe, Jeff 550000.00 13
Kennedy, Tyler 542000.00 8
Sabourin, Dany 512000.00 4
Minard, Chris 500000.00 0

I love hockey.

This post should be equally enjoyed by both the SQL nerd and sports fanatic.... and possibly Sidney Crosby's mom. I've included an example of how to calculate percentiles using SQL Server 2005.

The example starts out by populating a table variable with the salaries of the 2008-09 Pittsburgh Penguins!!! Thank God Taffe is in that list....

Since percentiles are a method of ranking data, first we must rank each value in the dataset. This is accomplished in the "SALARIES_RANKED" CTE by using the ROW_NUMBER() function. After ranking the values, we can determine any Nth percentile by using the following formula.

SalaryRank - 1 = ROUND((SalaryCount - 1) * N/100, 0)
DECLARE @Salaries TABLE(FullName VARCHAR(30), Salary MONEY)

INSERT INTO @Salaries VALUES('Crosby, Sidney',8700000)
INSERT INTO @Salaries VALUES('Gonchar, Sergei',5000000)
INSERT INTO @Salaries VALUES('Fleury, Marc-Andre',5000000)
INSERT INTO @Salaries VALUES('Whitney, Ryan',4000000)
INSERT INTO @Salaries VALUES('Malkin, Evgeni',3834000)
INSERT INTO @Salaries VALUES('Orpik, Brooks',3750000)
INSERT INTO @Salaries VALUES('Satan, Miroslav',3500000)
INSERT INTO @Salaries VALUES('Sykora, Petr',2625000)
INSERT INTO @Salaries VALUES('Sydor, Darryl',2500000)
INSERT INTO @Salaries VALUES('Fedotenko, Ruslan',2250000)
INSERT INTO @Salaries VALUES('Staal, Jordan',2200000)
INSERT INTO @Salaries VALUES('Gill, Hal',2075000)
INSERT INTO @Salaries VALUES('Eaton, Mark',2000000)
INSERT INTO @Salaries VALUES('Dupuis, Pascal',1400000)
INSERT INTO @Salaries VALUES('Cooke, Matt',1200000)
INSERT INTO @Salaries VALUES('Goligoski, Alex',984000)
INSERT INTO @Salaries VALUES('Letang, Kris',835000)
INSERT INTO @Salaries VALUES('Godard, Eric',750000)
INSERT INTO @Salaries VALUES('Scuderi, Rob',713000)
INSERT INTO @Salaries VALUES('Talbot, Maxime',675000)
INSERT INTO @Salaries VALUES('Beech, Kris',585000)
INSERT INTO @Salaries VALUES('Taffe, Jeff',550000)
INSERT INTO @Salaries VALUES('Kennedy, Tyler',542000)
INSERT INTO @Salaries VALUES('Sabourin, Dany',512000)
INSERT INTO @Salaries VALUES('Minard, Chris',500000);

WITH SALARIES_RANKED AS
(
SELECT FullName
,Salary
,ROW_NUMBER() OVER(ORDER BY Salary) AS SalaryRank
,COUNT(*) OVER(PARTITION BY NULL) AS SalaryCount
FROM @Salaries
)
SELECT '25th' AS Percentile
,Salary AS Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .25, 0)
UNION
SELECT 'Median'
,Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .50, 0)
UNION
SELECT '75th'
,Salary
FROM SALARIES_RANKED
WHERE SalaryRank - 1 = ROUND((SalaryCount - 1) * .75, 0)
ORDER BY Salary;

The above query produces the following results
Percentile Salary
---------- ---------------------
25th 713000.00
Median 2000000.00
75th 3500000.00

If we want to determine what percentile each players salary lies within, we can use the following query. Unlike the first example, this query uses the RANK() function instead of the ROW_NUMBER() function. This will assign the same rank to records containing the same value.
WITH SALARIES_RANKED AS
(
SELECT FullName
,Salary
,RANK() OVER(ORDER BY Salary) AS SalaryRank
,COUNT(*) OVER(PARTITION BY NULL) AS SalaryCount
FROM @Salaries
)
SELECT FullName
,Salary
,CAST(ROUND((SalaryRank - 1)/CAST(SalaryCount - 1 AS DECIMAL(10,2)) * 100, 0) AS INT) AS Percentile
FROM SALARIES_RANKED
ORDER BY Percentile DESC

Here is each player's percentile rank...
FullName                       Salary                Percentile
------------------------------ --------------------- -----------
Crosby, Sidney 8700000.00 100
Gonchar, Sergei 5000000.00 92
Fleury, Marc-Andre 5000000.00 92
Whitney, Ryan 4000000.00 88
Malkin, Evgeni 3834000.00 83
Orpik, Brooks 3750000.00 79
Satan, Miroslav 3500000.00 75
Sykora, Petr 2625000.00 71
Sydor, Darryl 2500000.00 67
Fedotenko, Ruslan 2250000.00 63
Staal, Jordan 2200000.00 58
Gill, Hal 2075000.00 54
Eaton, Mark 2000000.00 50
Dupuis, Pascal 1400000.00 46
Cooke, Matt 1200000.00 42
Goligoski, Alex 984000.00 38
Letang, Kris 835000.00 33
Godard, Eric 750000.00 29
Scuderi, Rob 713000.00 25
Talbot, Maxime 675000.00 21
Beech, Kris 585000.00 17
Taffe, Jeff 550000.00 13
Kennedy, Tyler 542000.00 8
Sabourin, Dany 512000.00 4
Minard, Chris 500000.00 0

Monday, September 8, 2008

Comma Separated Madness...

Here is a good example of the power of Common Table Expressions in SQL Server 2005. This feature has saved me many times by providing clean solutions to somewhat complex SQL problems.

The example starts out by declaring a table variable and inserting several names and genders into the table. Now, suppose you have to generate a report which groups each individual by gender, and displays them in a comma separated list. This can be done quite easily by using a recursive Common Table Expression.

Before we generate the names list, we need to give each row an identifier so we can keep track where we are when recursively calling the CTE. This is accomplished by using the ROW_NUMBER() function which is implemented below in the "NAMES_RANKED" CTE.

After this, the "GENERATE_NAMES_LIST" CTE begins by defining the initial value followed by a UNION ALL statement. The next query in the CTE recursively calls the "GENERATE_NAMES_LIST" CTE to concatenate the current value with the previous value. By itself, this CTE would produce the following results:

Male 1 Bob
Male 2 Bob, Dave
Male 3 Bob, Dave, Jack
Male 4 Bob, Dave, Jack, Tim
Female 1 Amber
Female 2 Amber, Jane
Female 3 Amber, Jane, Sue

As you can see, the values are accumulated with each call to the CTE. All we have to do now is determine the highest ranked for each gender, and display the values. This is accomplished by the CTE "MAX_RANK" and the final query.

Go crazy with this!

DECLARE @Names TABLE(FirstName VARCHAR(30), Gender VARCHAR(10))

INSERT INTO @Names VALUES('Bob','Male')
INSERT INTO @Names VALUES('Dave','Male')
INSERT INTO @Names VALUES('Jack','Male')
INSERT INTO @Names VALUES('Tim','Male')
INSERT INTO @Names VALUES('Jane','Female')
INSERT INTO @Names VALUES('Amber','Female')
INSERT INTO @Names VALUES('Sue','Female');

WITH NAMES_RANKED AS
(
SELECT FirstName
,Gender
,ROW_NUMBER() OVER(PARTITION BY Gender ORDER BY FirstName) RowNumber
FROM @Names
),
GENERATE_NAMES_LIST(Gender, RowNumber, NameList) AS
(
SELECT Gender
,RowNumber
,CAST(FirstName AS VARCHAR(MAX))
FROM NAMES_RANKED
WHERE RowNumber = 1
UNION ALL
SELECT n.Gender
,n.RowNumber
,CAST(g.NameList + ', ' + n.FirstName AS VARCHAR(MAX))
FROM GENERATE_NAMES_LIST g
INNER JOIN NAMES_RANKED n ON n.Gender = g.Gender
AND n.RowNumber = g.RowNumber + 1
),
MAX_RANK AS
(
SELECT Gender
,MAX(RowNumber) MaxRank
FROM NAMES_RANKED
GROUP BY Gender
)
SELECT g.Gender
,NameList
FROM GENERATE_NAMES_LIST g
INNER JOIN MAX_RANK m ON m.Gender = g.Gender
AND m.MaxRank = g.RowNumber

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]