Coding a High School Fantasy Football League/Draft Simulator using ML and Python (2024)

Sreekarkutagulla

Published in

Visionary Hub

7 min read

Jun 15, 2022

Overview

Around 42 million people play fantasy football every fall, hoping to outsmart their friends and family and win a coveted fantasy championship. Fantasy football usually uses stats from the NFL, the highest level of football play in the world. As a result, college and high school level football are rarely included in fantasy football apps like ESPN and Sleeper. So, being a high-school football fanatic, I decided to create a fantasy football “app” using stats from my high-school’s league. This article will detail my process in creating the app, my challenges, and my sources.

Background

Most fantasy apps create their own rankings using previous player performances and free agency. Once these rankings are released, teams will draft a quarterback, running-back, wide-receiver, tight-end, kicker, a defense, and multiple bench players. When the draft finishes, each team will receive a draft grade. Draft grades vary based on the fantasy app but they all use their projected rankings in their calculations. Once a grade is given, players will use that information to better their team by trading and picking up new players in free agency.

Coding a High School Fantasy Football League/Draft Simulator using ML and Python (4)

My Plan

Predict fantasy performance of the players using training data from the NFL
Create my own system of ranking players using prediction and position value
Find a way to grade the draft picks
Create a draft simulator

The Prediction

Advice: Predicting NFL player performance is a much easier starting project(I recommend it). With hundreds of datasets and analysis, NFL ML projects can be easier to code and there’s less of a hassle with formatting data.

Now, let’s get into the good stuff.

When I was beginning this project, I found enough data to create a list of draftable players but not enough to predict player performance. So, I decided to use NFL data instead to train my model. Thanks to Benjamin Abraham’s fantasy football project, I was able to collect data from the last 10 years and create a model of 66% testing-accuracy using a simple linear regression. However, I found the accuracy to be too low so I decided to make some changes.

My first change was to remove the age parameter, which I considered unrelated to my current goal. Age matters a lot in the NFL because as you get older, your body deteriorates which then hurts your performance. At the high school level, all the players are relatively young and have had less injuries than their NFL counterparts. As a result, I removed age from my list of parameters for all positions.

#removed age parameter 
if position == 'QB':
 X = df[['PassAtt/G','PassYds/G', 'PassTD/G', 'RushAtt/G', 'Y/A','RushYds/G','RushTD/G', 'TotTD/G','PPG','VBD']] elif position == 'RB':
 X = df[['RushAtt/G', 'Y/A','RushYds/G', 'RushTD/G','Rec/G','RecYds/G','Y/R','RecTD/G','TotTD/G','PPG','VBD']]
 elif position == 'WR' or 'TE':
 X = df[['Rec/G','RecYds/G','Y/R', 'RecTD/G','TotTD/G','PPG','VBD']]

Second, I decided to use a voting-regressor model instead due to the linear regression’s low accuracy. The voting regressor makes a prediction by finding the average of multiple other regressions. As a result, different “views” and results are used and a more accurate prediction is made.

reg1 = GradientBoostingRegressor(random_state=1)
reg2 = RandomForestRegressor(random_state=1)
reg3 = LinearRegression() regr = VotingRegressor(estimators=[('gb', reg1), ('rf', reg2), ('lr', reg3)])
regr.fit(X, y)
return regr

During training with the NFL data, the model had an accuracy of 88% and during the test, it had an accuracy of 86%. When tested with the smaller high-school dataset, the model had an accuracy of 92%.

Creating a Ranking System

You might have a few questions at this point regarding the need of a ranking system. Well, it all comes down to availability and positional value. In fantasy football, each position has a different value. Usually, WRs and RBs have more value than QBs and TEs due to their ability to catch the ball reliably. In conjunction with that, there are less high scoring RBs and WRs than QBs, a low supply but a high demand. As a result, running-backs and wide-receivers should have a higher value then other positions. This idea is known as biased draft value.

VBD

Coding a High School Fantasy Football League/Draft Simulator using ML and Python (5)

In order to replicate biased draft value, I found VBD or value based drafting to be very helpful. VBD is a strategy used by many fantasy football fanatics as a way to find the most valuable player for their draft pick. To find the VBD for each player, you multiply each player’s projected points by a positional weight. For example, if running-back A is projected 10.0 points and the RB weight is 4.5, then RB A’s VBD is 45.

So, I decided to give each position a weight: RB(4.5), WR(3.5), QB(2), TE(1.5). Once I multiplied all the players’ projections by their respective weights, I noticed that all the QBs had risen in the rankings and the RBs and WRs barely moved up. I then realized that the VBD formula did not take into account for positional availability.

Because there are less running-backs and wide-receivers available, their values should be higher. In order to account for this, I divided the players’ VBDs by the number of “good”(projected points are 12.5+) players in their respective positions. I called this new value “TrueVBD.”

Draft Grading Algorithm

Coding a High School Fantasy Football League/Draft Simulator using ML and Python (6)

The basis for the draft grading algorithm is simple: compare the value of the drafted player to the TrueVBD at that draft pick. However, implementing this took some work. Because the data set only included enough players for 8 rounds, teams could pick players not included in the data set for 1 round. As a result, comparing the values of drafted players to TrueVBD would not work because the dataset would run out of TrueVBD data points, resulting in an index error.

In order to fix this, I had to find a regression for my TrueVBD data points. With a regression, I could compare as many values as I want because it is infinite.

Because the dataset follows a clear logarithmic trend, I decided to use a logarithmic regression found in the numpy library.

x = np.arange(1, 63, 1)
y = np.array(df.TrueVBD)
plt.scatter(x, y)
plt.savefig("plot.png")
fit = np.polyfit(np.log(x), y, 1)#outputs intercept and slope

Using the outputted intercepts and slopes, I created a function that finds specific values of the regression:

def calc(x):
 value = 9.84435213 - 2.51900962*ln(x)
 return value

With that function, I could now compare the two values and store the difference in a variable called “change”. Then, using the variable, I reduce the overall draft grade.

grade = 100.0
change = calc(count) - vbd
grade = grade - (4*change)

Draft Simulator

For the simulator, I had to store and update each team regularly so I decided to use csv files. Because there are 8 teams, I had to create 8 separate csv files with the headers: Position, Name, VBD, and Score.

Position,Name,VBD,Score
QB,,,
RB,,,
WR,,,
TE,,,
FLEX,,,
K,,,
DST,,,
Bench1,,,
Bench2,,,
Bench3,,,

Using pandas and the .read_csv function, I stored the csv data in a dataframe, making it easier for data manipulation. Then, each team has its own function, taking in 3 inputs: Position, Name, and VBD. I then add the inputs into the csv and save it. So, if the program crashes in the middle of the draft, all the data will be saved. After the csv is updated with the draft pick, using the draft analysis algorithm, the script calculates the draft score and saves it in the score column.

def t1():
 position = input("Enter position: ")
 name = input("Enter name: ")
 vbd = float(input("Enter VBD: "))
 df_t1.loc[position, 'Name'] = name
 df_t1.loc[position, 'VBD'] = vbd
 df_t1.to_csv('t1.csv')
 change = calc(count) - vbd
 global t1_score
 t1_score = t1_score - (4*change)
 df_t1.loc['QB','Score'] = t1_score
 df_t1.to_csv('t1.csv')

Then, using two functions called firstround() and secondround(), the simulator imitates the snake draft.

def firstround():
 global count
 t1()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t2()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t3()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t4()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t5()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t6()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t7()
 print("------Pick Number: " + str(count) + " ------")
 count+=1
 t8()
 count+=1def secondround():
 global count
 print("------Pick Number: " + str(count) + " ------")
 t8()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t7()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t6()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t5()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t4()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t3()
 count+=1
 print("------Pick Number: " + str(count) + " ------")
 t2()
 count+=1 
 print("------Pick Number: " + str(count) + " ------")
 t1()
 count+=1

Takeaways

Even though I had to gather and format the data for quite some time, I had a great time working on this project. I learned a lot about building models and creating new tools for data analysis using python.

GitHub - SreekarKutagulla/WCAL_FantasyFootball: Fantasy Football for WCAL

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Hi, my name is Sreekar Kutagulla. I am a 16 year old developer from the Bay Area. Check out my sports analytics work with Lancer Analytics and my Github. Thanks for reading!