« Panopticon World | Main | Fine Young Social Democrats »

Draft Science

27 Jun 2007 10:33 am

John Hollinger's gone where stat guys normally fear to tread and attempted to devise a formula aimed at projecting college players' likely levels of NBA success. He backs himself up, naturally, with historical arguments looking retrospectively and what his formula says GMs should have done. Here's his results for the 2003 draft:

collegeresults.png

Obviously, if you passed on Dwayne Wade and Chris Bosh to grab Michael Sweetney, you'd feel pretty dumb, but for every Sweetney there's a Darko. All in all, the results seem to be okay. The interesting thing, though, is that the formula winds up giving a ton of weight to steals ("This is the one item that gets the most weight, actually -- it's even more important than PER!), to blocks, and to offensive boards along with giving players a big bonus for being young. The formula also tells you to give a boost to guys who nail three pointers, and to watch out for people who are too short -- or too ineffective on the boards -- for their position.

At the end of the day, the method winds up sounding a lot like "scouting." Instead of relying too heavily on how successful a college player the guy was, you look at his game results for signs of athleticism (steals, blocks, offensive boards), specific skills (pure shooting), and appropriate physical assets. You give a big bonus to younger guys, because you figure they'll learn. The formula doesn't really have much to add to this. It doesn't, in particular, do much to resolve any draft conundrums. Faced with an undersized power forward who was a successful rebounder in college, do you think he'll be one of those guys who continues to enjoy rebounding success despite being short (Paul Millsap) or one of those guys who's too short to handle the pro game?

A formula that helped answer questions like that would be tremendous. This one, not so much. It does, however, do the good service of cautioning against drafting the Adam Morrisons of the world -- guys who seem like unpromising pro prospects but who random sportwriters will just assert possess the "will to win" or something.

Share This

Comments (36)

if a "formula" could do the trick, we wouldn't need to play the games at all, would we? that's why we need scouting (why, the other day in the wall street journal bill james allowed as how he learns from scouts!).

The actual picks were much more prescient than the formula. It looks like the GMs are doing something right.

I agree. Stuff like this is so annoying. It's basically scouting except you limit the universe of information you're willing to consider, and for no apparent reason. What if the player, like Greg Oden or Spencer Hawes, played much of the season injured? Shouldn't that be taken into consideration? Or what if the player's "rebound rate" is low because he happened to play alongside monster rebounders. That was certainly the case for Corey Brewer, who played alongside Joakim Noah, Al Horford, and Chris Richard. The same is true of Spencer Hawes, who played alongside John Brockman.


I would tend to think of steals, blocks and rebounds as indicators not of pure athleticism, but of court sense and a knowledge of how to play the game. If you want pure athleticism, you can time sprints and measure vertical leap. In the course of a season, you get steals, blocks and rebounds from anticipating what's going to happen and getting to the ball.

You guys are acting like GMs should take this analysis and adhere to it rigidly. That's absurd. Its a tool, just like scouting, to be used in your analysis.

That's what drives me nuts about criticism of sports analysis. Of COURSE it doesn't take into effect injuries or heart or intangibles. ITS NOT SUPPOSED TO. Your scouting is supposed to take that into effect.

But a formula that does a job as well as months of labor-intensive scouting seems like an improvement, doesn't it? Look, no method will be perfect, but a good statistical method can point to players that seem under or overrated and force you to look at them harder.

I think Carlos has it right--this would at best be one tool that would be used as an indicator, to be backed up (or not) by more detailed scouting.

A couple of caveats, though. First, I found his draft predictions from the past 5 years to be a bit underwhelming. Especially when you consider that this was the data set Hollinger used to fine-tune his algorithm. So these aren't really predictions at all--they're a best case curve fit of his formual to existing data.

Second, while it might make sense for the run of the mill recruit, the idea of penalizing Oden because he's 7 feet tall just seems bizarre and idiotic to me.

The problem with the system I think is that it punishes good teams with balance.

Kevin Durant has monster numbers because the rest of hs team is pretty not good.

Durant grabs all the rebounds for his team because the rest of his team are lousy rebounders.

There is also a problem with the steals formulation, as steals also come from playing crappy defense.

And that is the ultimate problem of the system - defense is not really taken into account.

I found the exercise pretty useless frankly. Especially because of the short time frame, its failure to look at the viability of the system historically even though Hollinger could easily have done this.

Man, before you guys get so critical you should actually, like, read the whole article. And instead of just looking at one year, 2003, why not consider all the years he does present data for (2002-2006). Fact of the matter is, Hollinger's system doesn't miss out on any key players and contains very few busts. And he acknowledges that it should be accompanied by scouting- Borchardt, for example, had an injury history that correctly drove down his draft day value.

Basically, you could take Hollinger's system straight up over the actual results over that time span and do better figuring out roughly where guys should get taken. A system like this presents tremendous value to a GM. Instead of just going by gut feeling, when you are looking at a player and trying to justify drafting him earlier because of 'intagibles' you had better be aware of how strong those intagibles need to be. This is one of the best (and useful) stats-oriented articles I've read recently.

I hate to bring the whole "Moneyball" argument to the NBA, but it always seems as though people are way too slow in adopting statistical tools for evaluation purposes. I'm not saying that Hollinger's PER stats are perfect but if you look at the article his method seemed much better than the average GM. For his evaluation of Boozer alone as the best player in his draft class while 34 other teams passed on him, you should at least consider that statistics can eliminate some of the biases inherent in traditional scouting.

i can't speak for others, but i'm certainly not opposed to utilizing statistical analysis to improve our ability to judge players. my first comment was to poke fun at our host for suggesting that there might be a "formula" that would simply tell us.

and there isn't, and can't be, for a couple of very good reasons: a game like baseball, with its several hundred discrete events per game, is a natural for statistical analysis. even so, one of the pioneers of such analysis, bill james, noted in the wsj the other day that he has learned to learn from scouts.

and second, college basketball doesn't allow us to normalize. teams do not play the same opponents, so who knows how that influences the stats? (and that's putting aside the uneven quality of scorer's decisions in college games and the much greater variety of tempo in the college game.)

i realize i got mentally distracted by my second point and didn't finish the first: "flow" games, like basketball and soccer, don't lend themselves as readily to statistical analysis.

An example of the team effect. Corey Brewer's season rebounding stats were not as high as they would have been had he played with different teammates. Here is the box score for the national championship game:

Ohio State Buckeyes STARTERS MIN FGM-A FTM-A OFF REB AST PF PTS I. Harris, F 26 2-8 1-2 1 5 0 2 7 G. Oden, C 38 10-15 5-8 4 12 1 4 25 M. Conley Jr., G 34 7-13 5-6 1 3 6 3 20 R. Lewis, G 34 6-13 0-1 2 3 0 3 12 J. Butler, G 36 1-7 0-0 0 2 1 3 3

BENCH MIN FGM-A FTM-A OFF REB AST PF PTS
D. Lighty, G-F 13 2-3 0-0 0 0 1 1 4
D. Cook, G 9 1-2 0-0 0 0 1 1 2
M. Terwilliger, F-C 5 1-1 0-0 0 0 0 1 2
O. Hunter, F 5 0-2 0-0 2 2 0 2 0
TOTALS FGM-A FTM-A OFF REB AST PF PTS
30-64 11-17 10 27 10 20 75
46.9% 64.7%
TEAM REBS: 1
TURNOVERS: 7 (J Butler 1, M Conley Jr. 2, O Hunter 1, G Oden 2, R Lewis 1)
BLOCKED SHOTS: 4 (G Oden 4)
STEALS: 11 (J Butler 2, D Cook 1, M Conley Jr. 4, I Harris 1, D Lighty 1, G Oden 1, R Lewis 1)
3-PT FGS: 4-23, .174 (J Butler 1-6, D Cook 0-1, M Conley Jr. 1-3, I Harris 2-8, D Lighty 0-1, R Lewis 0-4)

Florida Gators
STARTERS MIN FGM-A FTM-A OFF REB AST PF PTS
C. Brewer, F 36 4-12 2-2 0 8 1 2 13
J. Noah, F-C 21 1-3 6-6 0 3 0 4 8
A. Horford, F-C 34 6-15 6-8 4 12 3 3 18
T. Green, G 38 4-6 5-5 0 3 6 0 16
L. Humphrey, G 34 5-8 0-0 0 1 1 1 14
BENCH MIN FGM-A FTM-A OFF REB AST PF PTS
W. Hodge, G 11 2-2 1-1 0 1 0 1 5
C. Richard, F-C 20 3-5 2-3 5 8 0 5 8
M. Speights, F-C 6 1-2 0-0 1 2 0 3 2
TOTALS

Brewer grabbed 8 rebounds, third highest total in the game, behind Oden and Horford. Why? Noah was in foul trouble, Oden was giving the Gator big men fits inside, OSU was behind and shooting 3s creating long rebounds. for the season, Brewer average 4.7 boards per game For the season, Horford and Noah averaged 9.5 and 8.5.

Compare Durant and Texas. Durant averaged 11 rebounds a game. Damion James was the next best rebounder at 7.2. No one else averaged more than 4. In short, Texas was not a team deep in rebounders.

In its NCAA loss to USC, getting beat by 19, Durant had 9 rebounds in 40 minutes. In Texas' losses to Kansas, Durant had 9 and 10 rebounds in 34 and 43 minutes. And these were monster games for Durant scoring.

My point is without taking into account the context of the team the player is playng for, such statistics are at best, misleading.

My own view is that the competition Durant played coupled with the relatively lower quality of his teammates have led to Kevin Durant being overrated.

Clearly the 2 pick here but ridiculous to be considered Oden's equal.

To my mind, the idea that anyone would consider picking Durant over Oden is simply absurd.

For no reason in particular, I hate comments like this:

"Man, before you guys get so critical you should actually, like, read the whole article."

Hey Man, I think we did. The question isafter reading the article, what is the significance of your statement:

"Fact of the matter is, Hollinger's system doesn't miss out on any key players and contains very few busts."

You could say the same about the draft itself. Statistical analysis as presented by Hollinger was supposed to be BETTER. One of the funniest things he wrote was how statistics could avoid making Jared Jeffries a lottery pick. And then I READ his ratings and lo and behold, he had Jerod Jeffries as a lottery pick!

In the end, that is what is most annoying about Hollinger, he claims he has provided systems that do not do what he claims they do.

"Fact of the matter is, Hollinger's system doesn't miss out on any key players and contains very few busts." "You could say the same about the draft itself". Not, you could not. It's not unusual to see undrafted guys become good players (Udonis Haslem, for example). And it seems to me you are cherry picking here. Looking at Hollinger ratings, it seems to me they do better than the actual draft picks (we could try a PER comparison, but I don't have the time right now).

Carlos:

It seems to me you need to actually do the comparison before saying "they do better."

Please do.

As for Haslem, a Gator and thus one of my favorite players, to call him a KEY player seems a tad strong.

Shall we look for similar players that Hollinger's system no doubt missed. What you have done is the very essence of cherry picking.

"A formula that helped answer questions like that would be tremendous. This one, not so much."

But if Hollinger's formula outperforms actual drafts, which it clearly seems to do, one can indeed say it's quite tremendous, despite the standards drawbacks to statistical models of individual basketball players.

The only real problem I can see here is if you think the formula's prescience is due to mere fitting existing data anomalies, rather than having any actual predictive ability for future drafts.

I can certainly come up with a formula that states: in the 21st century, teams without a dominant center will win titles in years divisible by four. But my suspicion is that Hollinger's formula is quite a bit more useful than my example.

But if Hollinger's formula outperforms actual drafts, which it clearly seems to do

How did it do that exactly? Take 2003:

2003 Draft: Top 12 rated players NO. PLAYER SCHOOL SCORE PICKED* ACTUAL ORDER* 1. Carmelo Anthony Syracuse 781.3 1 Carmelo Anthony 2. Mike Sweetney Georgetown 702.8 7 Chris Bosh

3. Chris Bosh Georgia Tech 688.4 2 Dwyane Wade

4. Dwyane Wade Marquette 600.4 3 Chris Kaman
5. Nick Collison Kansas 553.4 9 Kirk Hinrich
6. T.J. Ford Texas 549.5 6 T.J. Ford
7. Kirk Hinrich Kansas 504.0 5 Michael Sweetney
8. Josh Howard Wake Forest 501.4 17 Jarvis Hayes
9. Kyle Korver Creighton 499.7 31 Nick Collison
10. David West Xavier 494.7 14 Marcus Banks
11. Troy Bell Boston College 481.5 13 Luke Ridnour
12. Jarvis Hayes Georgia 478.9 8 Reece Gaines

The actual draft was clearly superior to Holliger's system, quite an accomplishment considering Hollinger is building his system after the fact.

i'm sure there are many good counterexamples, but this draft seems easy to pick: very good big men/centers seem more often to give you a very good chance to win it all (shaq, duncan, hakeem, patrick, zo, earlier you've got kareem, moses, walton, etc.) than very good wing players do (vince, kobe alone, tmac, etc.). yes, there's jordan, but he had significant help and is generally thought to be GOAT (there's also magic and bird, but they had lots of help, particularly from great big me). yes, there are some candymen out there, but no one really thought ostertag or bradley or rick smits could alone get you over the hump.

it just seems like there are far too many high-scoring wing men who may seem dominant in some sense -- because they can score and make highlight reels -- but aren't going to take you far in the playoffs. by contrast, you get a howard and you seem guaranteed to get somewhere. go big.

Armando, the reason I wondered if you had read the article is that you continue to make the claim that Hollinger's system does no better than the actual drafts. So I repeat: read the article. His system does better. The article itself performs the comparison. I'll summarize. His approach narrowly misses: Foye, Villanueva and Kaman. His approach avoids: Dajuan Wagner, Melvin Ely, Marcus Haislip, Reece Gaines, Rafael Araujo, Antoine Wright, not to mention Adam Morrison, J.J. Redick, Hilton Armstrong and Cedric Simmons who look like they were overrated on draft day as well. He also picks up Carlos Boozer, Udonis Haslem, Tayshaun Prince, David West, Josh Howard, Kevin Martin, Delonte West, and Danny Granger where the actual draft overlooked these guys. Maybe you disagree with this assessment b/c you think the method he uses is poor (or is somehow cherry picking). But in that case I think you are obliged to ACTUALLY MAKE AN ARGUMENT as to why this is so, instead of just claiming that Hollinger hasn't put one forth.

Petey points out the important concern: is the formula prescient or is it just curve-fitting. There are a lot of players considered, but with many of them will be non-controversial picks regardless of the stastical method used. The fact that Hollinger emphasize steals, blocks and rebounds are a big part of why he is able to do better than the actual drafts and it leaves one to wonder if those stats just happen to explain the oddballs of the past 5 years or if they have explanatory power for the future.

Armando, so I didn't refresh before posting, but allow me to point out that you stubbornly refuse to consider the entire article. Instead you are focusing on one draft (2003).

I'm not so convinced his system outperforms the regular drafts. He does pick up on some players and misses some busts. On the other hand, he includes some players that would be viewed as busts if they were picked as high as his system says, and his relative rankings also seem of in some cases.

The 02 draft just wasn't very good, from appearances. And he "hits" Tayshaun Prince only by moving him up from 16th to 11th, which is not that great of a bump. He also bumped up Casey Jacobson and Vincent Yarborough, neither of whom has done much of anything.

03 has been discussed. He does hit on Josh Howard, but also has overranked Kyle Korver and badly overrated Sweetney (although he tries to downplay that by saying he had a decent rookie year.)

In 04, the biggest differences betwen his ranking and the actual draft are under-rating Okafor and over-rating Delonte West. I'd say the actual order is a better match than his projected order.

In 05, while it's a bit early to evaluate, he's got May and McCants both bumped up too high, and Deron Williams much too low. His big bump is Chris Taft (who would have been a huge bust taken at this level.)

So, even with his rankings designed to fit this data, it doesn't look like it's performed that much better than the actual picks to me. The one big success story for positive picks is Boozer. He's avoided some busts, but also added a few busts. And I don't think he's done a very good job with the relative ranking of the top players. (and this is at least as important as Iding who the best players are. You think fans would be happy with a GM who picked Sweetney over Wade, or Delonte West over Okafor?)

It seems like a system like this one could be a valuable tool to help GMs (and I'd hope most GMs already have something like this, which includes a lot more details from workouts and other inputs) mostly as a red flag, to point out question makr players, and also to maybe make you take a second look at some players (like Boozer.) But it's not a great stand-alone tool.

mpowell:

How many draft are there for me to consider? To make conclousion about the 2006 draft ALREADy is just plain foolish frankly.

So that's one.

2005 is a little better I suppose. Let's look at it. Hollinger has Chris Paul as 1, Boagout at 5 and Deron Williams at 11.

The draft had Andrew Bogout at 1, Deron Williams at 2 and Paul at 4.

Hollinger has Sean May at 3. The draft had him at 9.

Hollinger had Rashad McCants at 4. The draft at 10.

The rest of the draft was pretty even.

Hollinger loses. That makes 2003 and 2005 that Hollinger's system underperforms the Draft.

2002 can go to Hollinger.

That leaves 2004. Looks like a wash to me. Luol Deng should have been 1 says Hollinger. Went 5. The jury is out no?

Hollinger says Ben Gordon 7. Draft said 2.

Delonte West is the big jumper for Hollinger, all the way to 2. Draft said 12. I think the draft wins on that one.

So where is the evidence that Hollinger was better?

"I'm not so convinced his system outperforms the regular drafts."

But that's only because the basketball acumen on display in the rest of your post is lacking.

"and badly overrated Sweetney"

Hollinger admits that his system doesn't take into account players with a propensity to injury. He should also admit that his system doesn't take into account players with a propensity toward getting fat.

For example, I'd disregard his formula's advice to select Glen Davis with a lottery pick this year, as Davis seems to also be at prime risk of being too fat to perform at the NBA level.

Just because Armando asked, I did a quick PER comparison between Hollinger picks and the actual draft for the 2003 draft.
NO. PLAYER PER PICKED* ACTUAL ORDER* PER
1. Carmelo Anthony 22.28 1 Carmelo Anthony 22.28
2. Mike Sweetney 11.01 7 Chris Bosh 22.83
3. Chris Bosh 22.83 2 Dwyane Wade 29.18
4. Dwyane Wade 29.18 3 Chris Kaman 13.04
5. Nick Collison 14.34 9 Kirk Hinrich 17.20
6. T.J. Ford 18.41 6 T.J. Ford 18.41
7. Kirk Hinrich 17.20 5 Michael Sweetney 11.01
8. Josh Howard 20.16 17 Jarvis Hayes 10.84
9. Kyle Korver 14.34 31 Nick Collison 14.34
10. David West 19.11 14 Marcus Banks 11.43
11. Troy Bell -4.05(04) 13 Luke Ridnour 13.82
12. Jarvis Hayes 10.84 8 Reece Gaines 6.07 (06)

Hollinger's system outperforms the actual draft slightly here. The average Hollinger pick has a PER of 16.30 versus 15.87 of the real picks. Not bad for a formula.

Thanls Carlos. Assuming the perfection of Hollinger's PER, the aggregate of the first 12 picks were slightly in Hollinger's favor.

In the top 5, it is not helpful, rather it hurts.

At best, this seems a "tie-breaker" statistic formula for one you are close on two players.

It simply does not convince as something meaningful to me.

Add to the fact that this is a system created AFTER the fact with limited data, Hollinger should have not ballyhooed this as he did. He has completely oversold it.

BTW, anyone following the Simmons-Forde mock draft?

And some comments by Hollinger about his system in the Apbrmetrics forum a while ago: "- There is a fairly large SOS component, done by taking Jeff Sagarin's rating of a team's sked strength for that season. The results were also pace adjusted.

-- PER actually played a very small role in this. It turns out that it's helpful, but other things are much better predictors.

-- In response to Gabe's question on my approach, that's absolutely what I did -- back-tested it against previous drafts to see what works. I have no doubt that as we learn more from subsequent drafts this will need further revision, perhaps a lot. I realize there's a danger of confirmation bias by approaching it this way, but I saw no other reasonable approach.

-- As for how I chose what stats to weight -- lots and lots of trial and error, mostly, but based both on my own experience and some previous studies done by other people, I also had a pretty good idea where to look.

-- Somebody commented that I just picked different dogs in these drafts and mentioned Borchardt, Taft, Sweetney and Luke Jackson. Actually all these guys got hurt except Sweetney, who couldn't stop eating. Borchardt dropped like a stone in the real draft because he'd had injuries in college, and folks were worried Sweetney would put on pounds (much like Big Baby this year) -- these are the types of areas where scouting really can augment numbers a lot. But Vincent Yarbrough, Andre Emmett and Troy Bell ... now those guys were dogs."

Armando, I certainly wouldn't use it as "tie breaker" system. It seems to me that the utility is exactly the opposite: you get a reasonably good idea of who is good and who is not, compare it with the opinions of scouts and the media, and use deeper scouting on the guys who look over or underrated, or who seem very close in ability.

Carlos:

Who's good and who's not? So it is not helpful in deciding who should be 3rd 4th or 5th? Just who generally should be in the top 15 and who should not?

Is that Hollinger's idea? So Hollinger limits "busts"?

I don't see that at all. No, I must reject that view, it seems no more than a tie-breaker to me at best.

Well, the real test will be to check his predictions vs. this year's draft. It's hard to be sure since he only showed the lottery pics for previous years, but it certainly seems like his formula is producing picks much more divergent from expert opinion than was the case for previous years. (This year he has 4 lottery picks that are ranked 25 or below by Chad Ford. For the previous lotteries, he never had more than 1, and for several years had none in his lottery that were picked 25th or below.)

That suggests to me that Hollinger's past success is more of a case of curve fitting rather than inherent value of his methodology. When he could adjust the formula, it gave him results that, while somewhat different, weren't all that far off from the actual drafts. Now, when he has to predict the future, his formula suddenly sharply diverges from expert opinion.

This thread is probably long dead, but I don't plan to pay for ESPN Insider and am curious: how does Hollinger pick the players to rate? Is he looking at only US college players, or does he include foreign players as well? (I'm assuming no US high school players, since I see no LeBron on the 2003 draft.)

BTW, anyone following the Simmons-Forde mock draft?

I started to, but was immediately tripped up by this Simmons line:

The thing is, all these franchise centers are basically the same -- it just comes down to their inherent will to dominate a game. Hakeem had that will, Duncan has it, Moses had it, Shaq had it in 2000 and 2001 ... for whatever reason, Ewing didn't have it, and neither did Mourning or Robinson.

That's just moronic. Hakeem, Duncan, and Shaq are all better than Ewing or Mourning, but I've never seen anyone argue that Ewing or Mourning lacked "will." Both of them were as intense as it gets.

This column, although marked insider, is actually free, or else I wouldn't have been commenting on it either. Hollinger gives a general overview of his methods, but doesn't present his formula.

The gist is that he uses PER combined with a number of other stats ("athletic markers" like rebound rate and number of steals), height (being too short or above 7 feet is a negative), and age to rank players.

Huh. I can't see it on my computer, anyway. My question was not so much about the formula he uses but the universe of basketball players he considers. Does he look at all US college players? Only those from Division I? Only those from good teams in Division I?

Only US college players. I think from all divisions, but I'm not 100% sure.


Comments closed July 11, 2007.

Copyright © 2008 by The Atlantic Monthly Group. All rights reserved.