Underdog Simulations

Last year I used Underdog data to try and determine what the optimal draft strategy was. Overall, I think that analysis was good, and is still useful to look at. The more I thought about it though (and the more I read other people’s analysis) I began to like it less and less. It’s not that the data is bad, it’s just that there isn’t a whole lot of it! Underdog is the only platform that provides us with their data, and since they’re a newer company, we only have a few seasons to work with. So when we’re looking to identify optimal strategies, we’re really only going to find what’s been optimal the last few seasons!

IMO that’s a big problem! What if Kelce had gotten hurt last year? Or what if a late round QB had truly broken out? Remember the season Lamar Jackson had in 2019? Well…Underdog wasn’t a company back then! So we don’t even have something that recent to use in our analysis! What this leads to is a lot of “overfitting” in our breakdowns. It’s not overfitting from a traditional machine learning standpoint…but it does have the same result. We’re looking at a sample size of less than a handful of seasons and trying to draw overarching analysis from that. And quite frankly, that’s just not enough data yet to truly know what works over a long period of time.

Oh, and if you think that’s depressing, then I’ll do you one better! Not only do I think we don’t have enough seasons…but I don’t think there’s enough data WITHIN each season. I mean think about it! How many 4th round QB’s have there been over the last 3 years? The answer…just three! Zero in 2020, one in 2021, two in 2022. So in three years worth of data we have 3 total data points about what might happen if you draft a QB in the 4th round. I don’t think you’ll find someone who thinks that’s a large enough sample size.

So…then there’s nothing we can do?

Of course not! This would be a pretty lame study if that was my final conclusion! What it means is that we can’t just use Underdog data in our analysis. And more importantly we can’t only use data from seasons that actually happened. You read that right! Not only do we need more seasons worth of data, but we also need more outcomes than the one’s we’ve seen happen over the last 20 years!

Here are Mark Andrews game logs (HPPR points) from the last three seasons:

2020:  20.3___3.4___3.7___19.2___14.6___3.1___0___4.7___3.7___9.6___18.1___0___0___10.3___15.1___10.6___4.7 – – 141.1 Points – – Cluster 2 – – ADP: Rd 4

2021:  3.5___8.2___13.4___9.2___36.2___15.3___6.3___0___6.9___15.3___11.3___14.5___7___23___30.6___22.5___11.9 – – 235.1 Points – – Cluster 5 – – ADP: Rd 5

2022:  7.7___21.2___24.9___2.5___18.9___20.1___0.4___4.9___0___0___9.3___9___7.4___2.7___4.6___6___14.5 – – 154 Points – – Cluster 2 – – ADP: Rd 3

As you can see, drafting Mark Andrews each of the last 3 seasons has given you a few different results. But are these the only possible results that could have come from drafting Mark Andrews each of the last three years? Definitely not! I would argue that the seasons below were also possible:

2020:  0.5___6.9___20.5___16.7___4.97___12.15___5.59___7___5.68___7.1___3.14___0.1___15.99___22.7___4.9___17.8___11.94 – – 140 Points – – Cluster 2

2021:  3.3___15.7___11.2___7.5___12.56___2.8___10.52___28.14___18.4___16.65___3.3___6.08___23.12___14.37___27.17___11.6___6.7 – – 236 Points – – Cluster 5

2022:  16.6___16.6___14.3___2.2___3.32___5.07___0.55___22___1.08___0.2___29.8___24.71___12.64___4___2.2___0.8___2.2 – – 157 Points – – Cluster 2

The above seasons never actually occurred by a TE in the NFL. They’re synthetically created from my machine learning algorithm, based off actual seasons we’ve seen over the last 20 years. My model looks at the game logs of every player from 2000 on and attempts to create hundreds of thousands of possible seasons based off that data.

Ok…so we’ve got a bunch of simulated data. How is that useful?

The way I see it, there are three main problems:

1) We don’t have enough seasons worth of data.

2) Within each season, we only have one potential outcome for each player (among the millions of outcomes that could have occurred).

3) Because of 1 and 2, we have very few data points for what happens when you use different strategies.

If we can solve for each one of these problems, then we can feel a lot better about any sort of analysis that comes as a result.

How I Solved This Problem

The end result is a model that can create simulated seasons that we can test different builds on.

If we were to think about this problem from a high level perspective, the steps are fairly simple.

1) Gather play by play data from every player since 2000. Then create a game log of Underdog Fantasy points by week for each player, along with their final points over the entire season.

2) Cluster players by position and performance, looking at not only the end of season points, but also weekly outcomes

3) Create thousands of simulated player results from each position/cluster combination. So you have thousands of simulated seasons of elite RB’s, mediocre RB’s, RB’s who did great and then got hurt, RB’s who did trash all season etc. And then the same for every position

4) Calculate the odds of drafting a player from each cluster depending on where you’re picking in the draft. This is a super important part that not only needs to be done correctly, but also can’t just be based off the last three seasons. We need to do our best to pull in data from other sources.

5) Create a DataFrame with every possible draft combination (WR-WR-WR-RB-QB-QB-RB-TE-WR-RB-RB-TE-QB-TE-RB-WR-WR-RB being a example combination)
6) For each combination, simulate X number of seasons worth of outcomes. For example, in step 5 our draft has a WR being taken in round 1. Let’s say we have the 12th pick, and the odds of drafting a WR from each cluster at that pick are:

Cluster 1: 5%

Cluster 2: 10%

Cluster 3: 7%

Cluster 4: 20%

Cluster 5: 30%

Cluster 6: 14%

Cluster 7: 14%

(Those are completely made up so don’t read into the numbers.)

So given these example odds, the program needs to “roll a dice” and give us a random player from the selected cluster for that simulation. This process is repeated for every round until we have a complete team of 18 players.

7) You need to calculate the points your roster would score each week, and then how many it would score across the entire season.

8) Repeat this simulation a bunch of times for every single draft combination. You can think of this as simulating a ton of different seasons

9) Cycle through a bunch of different draft combinations.

10) Analyze which draft strategies produce the highest average points!

Roadblock #1 – Storage

In practice, there are more roadblocks than it seems. The biggest issue is the number of possible combinations. Just look at the example I gave above for an example 18 round draft. You don’t have to be math wizard to realize that there are A LOT of different combinations. I don’t care how good of a computer you have, you can’t create a DataFrame that has every combination. And even if you did…how would you even store it! There’s just no way to create a file that has trillions of rows, let alone try to run calculations on it!

To solve this, we need to reduce the possible combinations (shout out to my fiance for coming up with this solution!). The most important rounds of the draft are the early and middle rounds. After that, it’s not super important whether you draft a RB in the 15th and a WR in the 16th, or the reverse. All that matters is that you took a WR and a RB late. So, I created a DataFrame that has every combination for rounds 1-9, and then every combination count for the double digit rounds. So instead of our example draft looking like this:

WR-WR-WR-RB-QB-QB-RB-TE-WR-RB-RB-TE-QB-TE-RB-WR-WR-RB

It looks like this:

WR-WR-WR-RB-QB-QB-RB-TE-WR-1QB-4RB-2WR-2TE

Shortening the data like that makes it so your computer doesn’t self destruct, and also lets you run simulations on the dataset!

Roadblock #2 – Time

Another issue we run into is time. Unfortunately, I don’t have a supercomputer. And I also don’t have years to complete these studies. I complete 7-8 different projects each offseason, so even if my computer is running 24/7 (it is), there just isn’t enough time to run through every combination we create. This is an easier problem to solve though. All we have to do is randomly sample our millions of record and run through as many combinations as we can. As I said above, there’s really not much difference between selecting a WR in round 7 and then a RB in round 8 as opposed to the other way around. So as long as the sample is random, we’re going to start to see trends very quickly. Even after as little as 10,000 combinations we can see the results stabilize. This is another reason we don’t need to get too hung up on roadblock #1. If we can see trends in success after just 10,000 combinations, then we don’t need to worry about creating the trillions of possible combos. We can cut that sample way down, since we’re never actually going to use all of it.

The Results:

Full Results
Robust Running Back
Hero RB – WR God Mode
Hero RB – Robust WR Method
Hero RB – Mid 1st Round Pick
Hero RB – The Brady Method
Hero RB – Double Late TE Method
Extreme Robust Wide Receiver
The 2-4-10-2 Build

RB Zero
Robust Tight End