How Do We Measure Mock Draft Quality?
The One Where I Try to Become an Honest-to-Goodness Sports Analytics Blogger
It has been almost a year since
and his cohost released an episode of the podcast, but I am still drawn back to its thesis. The podcast examines how soccer analytics evolved through the dissemination of ideas written by dedicated folks utilizing the best data available. The importance of communicating ideas through writing came to mind again earlier this year when I made my annual pilgrimage to Indianapolis for the NFL Combine. I was there to spread the gospel of Grinding the Mocks to current/potential team clients. Michael Lopez, the NFL’s Senior Director of Football Data and Analytics, invites someone outside of football analytics to speak at the Big Data Bowl, the NFL’s annual tracking data science competition.This year, the speaker was Eric Tulsky, the General Manager of the NHL’s Carolina Hurricanes. Dr. Tulsky spoke about the importance of being inquisitive and humble as an analyst, mixing in anecdotes about his time as a proto-hockey (and football) analytics blogger who eventually caught on in an NHL front office and rose to the highest position in the organization. Listening to his talk inspired me to jump more into writing during my “offseason” (aka from June through December) and focus more on ideas that have been brewing in my head to do some honest-to-goodness R&D on.
Thank you for coming along for the ride this summer. Now, on with the show!
The Mock Draft Accuracy Industrial Complex
“How do we measure mock draft quality?” seems like a simple question. However, if we’re honest, answering that question means assuming some things. And “when you assume, you make an ass out of you and me”? Not necessarily! As with most things, it depends on what the assumptions are. If you take a simplistic approach, like what most “mock draft accuracy” contests1 do, you come up with a scoring system that gives out arbitrary points for things like player/team or even team/position matches.
You can probably see where I’m going with this. There are already a lot of assumptions embedded in this type of analysis! For example, my arch-nemesis2 at the NFL Mock Draft Database’s scoring system for ranking mock drafts is as follows:
What are just a few of the multitude of assumptions underlying this scoring system?
Correctly predicting the team, position, and player is worth the same (10 points)
A 100-point bonus assumes perfect predictions are 10x valuable than a partial one.
Zero credit for "close" scenarios, such as being one pick off in draft position.
Grinding the (Algorithmic) Superiority Complex
So, how do I think about mock draft quality? I’m a man of simple pleasures:
I don’t consider any type of mock draft team or positional match at all.3
I value earlier picks higher than later picks.4
It matters how far off a prediction is from the actual draft.5
In the rest of this blog post, I will focus largely on the last assumption, since how one measures the accuracy of a mock draft is the driving force behind quantifying mock draft quality. Let’s walk through a couple of methods6 to showcase my thinking:
Mean Squared Error
Mean Squared Error (MSE) is a pretty classic model evaluation metric, but it has some drawbacks. MSE tends to punish predictions a lot for being very off than for only being a little off. If a draftnik were to predict that a player will go 5th overall, but they go 10th, that's not too bad, right? However, if that same draftnik were to predict that player will go 5th and they go 15th, the penalty becomes much harsher. Not just twice as bad (-5 pts vs. -10 pts), but four times worse (-25 pts vs. -100 pts)! This assumes that being way off on a pick is exponentially more wrong than being slightly off. Add in a pick-value adjustment, and we’ve got a good start. However, we can surely do better…
Weighted Spearman Correlation
I was posting about my work on X (or the site formerly known as Twitter) when Christopher Long, a Principal Optimization Specialist at SumerSports, shared an interesting approach that I hadn’t thought about, which I’ve posted below. He smartly pointed out that a measure of correlation to the draft might capture something that a more traditional metric might miss, especially related to rankings.
When statisticians want to understand whether two variables are connected, they face a choice of correlation measures that can dramatically impact their findings. Spearman correlation, the method that Long references, assesses whether rankings in one variable correspond with the rankings in another. This makes it excellent at detecting relationships that don't follow straight lines, like in mock drafts. By contrast, Pearson correlation, the most common measure of correlation, operates strictly on numerical values (not rankings) and specifically hunts for straight-line relationships. This makes Spearman correlation a better tool for measuring unconventional patterns, while Pearson provides better insights into linear trends.
In the example above, the data has a Spearman correlation of 1 because the rankings on the X and Y axes follow each other perfectly, even though the metrics don’t display an exactly 1-to-1 relationship (aka a straight diagonal line). For our use case with Grinding the Mocks, mock draft rankings are what we want to optimize our analysis for!
Get to the Point Already!
So, back to my original question: “How do we measure mock draft quality?”. I propose that using Weighted Spearman Correlation,7 where we compare mock draft predictions to actual draft results, is the optimal approach. Spearman Correlation does well working with rankings and the types of non-linear trends that show up in mock drafts, allows for the use of weights to value earlier picks higher than later ones, and only considers player-pick matches in the results (no partial credit allowed).
With that being said, let’s look at a real-life example. The two highest-scoring mock drafts in the history of The Huddle Report, the draft industry’s most revered mock draft accuracy contest, are Josh Norris from Underdog Fantasy’s 2021 mock draft and Jason Boris of the Times News in Northeastern Pennsylvania’s 2024 mock. Norris’s mock correctly matched 16 players to teams and identified 27 of the 32 players drafted in 2021’s first round, while Boris’s had 15 correct matches and 29 of the top 32 in 2024.
According to our weighted mock draft correlation metric, Norris’s mock scored a .956 while Boris landed at .867. This put Norris’ “King of the Hill” mock at 6th overall in Grinding the Mocks data in this metric, while Boris sits at 2,167th (!!!) in the metric using the Fitzgerald-Spielberger draft trade value chart as our adjustment factor. Why is this the case? It’s high time for a graph to help us tell the story here!
Looking at the graph, you can see why Norris’s mock draft scored better than Boris’s, despite scoring the same in The Huddle Report’s grading system. Whereas Norris’s dots (in the blue on the chart) cluster a lot closer to the way that the actual draft unfolded, while Boris has many more picks (in orange on the chart) that are farther away from the draft outcome. This is because we don’t value inexact matches like Quinyon Mitchell to the Eagles (at pick 12 instead of where he went at pick 22) or Bo Nix to the Broncos (at pick 22 instead of where he went at pick 12).
Which mock draft scored the highest ever in Grinding the Mocks data, you ask? Walter Cherepinsky of the eponymous Walter Football’s final 2021 mock, which comes in at a cool .9628. I’m not a fan of his takes on football, politics, or almost anything else, but they (both Walter himself and draft analyst Charlie Campbell) are surely well-sourced and connected in NFL Draft circles. That might be surprising, but it is undeniable when looking through the lens of mock draft data and statistical analysis. It’s also not a surprise to me that the best-ranked mocks in our quality metrics come from the 2021 NFL Draft that occurred during the COVID-19 pandemic, but that’s a story for another time. For now, I hope to keep the Grinding the Mocks data nuggets coming!
Some examples: The Huddle Report, Fantasy Pros, Draftcuracy, or NFL Mock Draft Database.
This is meant as a joke, but I’d probably have more NFL team clients if his site didn’t exist!
Why? Getting the pick and player match is hard enough.
You can adjust picks for value using your favorite draft trade chart!
Prediction error is crucial in measuring mock draft quality and isn’t binary.
I could cover more than a couple of evaluation metrics, but you get the point!
The `weightedCorr` function in the `wCorr` package is a good place to start for R users.
Walter Cherepinsky’s top-ranked score is only 0.0066612 points greater than Josh Norris’s sixth-ranked one. How do you like them apples? The margins are razor-thin at the top!