
Get a feel for what the class is all about, including who it's for and any prerequisites
Discuss the class project, building a NBA Projection Model for 2018-19 Season Stats
Review what you will learn: Pandas Basics, NBA Stats Projection Model & NBA Fantasy Glory
Installing homebrew and xcode
Installing pyenv using homebrew on macOS
Installing python using pyenv
Overview of virtual environments and why they're important
Creating a virtual environment for our project
Activating and deactivating our virtual environments
How to install modules and packages
Downloading a package with pip package management
Installing necessary modules for our projection model
Jupyter Notebooks overview
Installing & launching Jupyter
Creating a New Notebook
Code & markdown cell types
Selecting the appropriate kernel
Restarting the kernel
Command mode vs edit mode
Two ways to run code
Order of execution within the notebook
Python Building Blocks
Importing modules
Printing
Variables & Raw Inputs
Lists
Dictionaries
For Loops
If, Else Statements
Functions
Arrays
How to import modules
Giving modules an alias
Calling specific values from a module
Using the print statement to output variables
The value in printing
Assigning values to a variable
Changing values in a variable
Naming variables
Case sensitivity with variables
Definition and examples of python lists
How to write a list in python
Data types within a list
Accessing items in a list
Appending to lists
Slicing lists
Definition and examples of python dictionaries
Looking up values from keys
Iterating over a list and dictionary
Printing values with a for loop
Utilizing break statements
Using the continue statement
Pairing if statements with logical conditions from mathematics
Understanding indentation
And / Or operators
Examples of various if, else statements
How to create a python function
Creating a points per game function
Arrays as a special variable
The benefits of using arrays
Applying mathematical statements to arrays
If you are coding in python and you come across some type of data, in our case NBA stats, you would be hard pressed to find a better alternative for wrangling, munging and analyzing the data than with pandas. The best part is that it’s open source and free to use.
Essentially, Pandas takes data (like a CSV file or SQL database query output) and creates Python objects with rows and columns (called a dataframe) that looks very similar to a table you’d see in excel. It’s easy to work with and has a lot of methods baked in that make it super useful.
The two primary types of data structures provided by pandas
Various inputs that can be used to create a dataframe
Various file types you can read in to create a dataframe
How series are different than dataframes
Creating an empty dataframe
Creating a dataframe with a dictionary & list
Using the read_csv() method
Reading in csv files from a different folder
Loading a csv with specific columns and indexes
Using the usecols parameter
Understanding attributes and methods
Running through typical attributes (shape, dtypes)
Head, tail and sample methods
Using info() and describe() as summary statistics
Built in pandas dataframe methods
Calling the sum() method on a single column
How to select single columns and multiple columns
Listing columns with list()
Selecting a single column with dot notation and square brackets
Selecting multiple columns with a list
Selecting columns with .iloc and .loc
Using = syntax to assign values to a column that doesn't exist
The insert() method
Deleting columns with del
Deleting columns with drop() method
Selecting a subset of columns
Renaming columns with a list
Setting new column names with df.columns
Using the rename() method
Selecting rows by passing the index location and .iloc
Using square brackets
Selecting multiple rows using the slice operator
Selecting rows using .loc
The .ix method
How to remove null values or rows with missing data
Dropping rows with .dropna()
The .dropna() parameters
Adding rows with the .append() method
What exactly is the inplace parameter and why you should care
Sorting dataframes based on index with .sort_index()
Sorting dataframe by a specific column using .sort_values()
Ascending and descending parameters
Sorting by multiple column values
How to filter dataframes based on specific conditions (and / or)
Using lists to make code more readable
Examples filtering dataframes for specific teams and stats
Overview of the .groupby() method
Grouping by season
Inspecting the groupby object
Example getting the mean age for each team
The split-apply-combine framework
Finding the sum of a specific column of a groupby object
Using the .describe() method on a groupby object
Concatenating dataframes from different sources
Using the .concat() method with square brackets
The ordering difference for .concat()
Difference between concat and append
Using the ignore_index() method on append
How to join two dataframes based on common columns
The .merge() method
The "how" parameter and various merge types
Calling the merge directly on a dataframe
Itertuples() and iterrows() as dataframe for loops
Returning the data as a series
How to print a dataframe within a for loop
Appending values from a dataframe to a list
How to use the .apply() method
Applying to an entire dataframe
Applying to a single column within a dataframe
Applying to a groupby object
The axis parameter within apply()
Exploring the numpy array
Getting the shape of arrays
Applying mathematical operations to arrays
Inspiration from KNN model
Why we need to normalize player data across seasons
Breaking down the steps in our model
Reading in our player data CSV
Dropping rows with missing values
Plotting games played histogram with matplotlib
Function to normalize data
For loop to apply normalized data function in a groupby object
Function to calculate euclidean distance
Testing the function on three NBA players
Function to find player row with two inputs
For loop over dataframe to save data into new variable
Converting a dataframe to a numpy array
Finding the difference between two arrays
Vectorizing a function
For loop to compare a player to multiple players using function
Dropping duplicates after merging dataframes
Accessing values in a season list
Getting the next value in a list based on an input and index
Appending values to an empty dictionary in a for loop
Sorting a dataframe based on percent error
Looping over ten rows of a dataframe
How to use the getattr() method
Embedding multiple for loops for various players and seasons
Creating a function to find a specific player
Adding a break statement
Adding in weighted variables to a given function
Adding a dataframe as an input variable
Bringing together all the building blocks into a final function
Looping over ten players to test the projection model
Creating a dataframe from a dictionary
Saving a dataframe out as a CSV file
How to manually calculate the root mean squared error
Calculating RMSE using scikit-learn
Using real stats to measure the effectiveness of the model
Reading in and munging data from a competitor
Updating dataframe to have identical columns
Converting a column datatype using .astype()
Creating new columns with current columns
Getting the RMSE for competitor projections
Modify projection model data to match competitor projections
Running RMSE on both models using same set of players
Merging dataframe to get player names
Which variables to modify for new insights
Function to calculate fantasy points for each player
Applying function to entire dataframe
Sorting dataframe based on projection fantasy points
Saving out dataframe to CSV file
Overview of value based drafting
Theoretical example of why VBD is effective
Function to get our baseline number
Saving out dataframe as a CSV file for Google Sheets import
VLOOKUP in google sheets to calculate value over replacement player
Sorting based on value based points
Known blind spots for our current model
Conclusion and next steps
What is this Course?
Let me start off by saying that my first love has always been the NBA and my second love is coding. As such, I think this class will be a lot of fun for passionate NBA fans who also happen to be aspiring coders. This is the premier Udemy class out there that uses strictly NBA stats as data to help wrap your head around concepts in the python programming language.
While I have found it helpful to read textbooks and watch online tutorials to get a better understanding of the basics for any subject, nothing beats project-based learning. Actually getting your hands dirty and running into real problems that require specific solutions has been my ideal way to learn something new.
With that being said, the hardest question typically is, what project should I focus on? From my personal experience, I’ve found it beneficial to focus on something you are passionate about. To find that something, just think of what you frequently pay attention to in your spare time, when no one is paying you...to pay attention to it. For me, that something is the NBA. I’m a proud subscriber to League Pass. It didn’t take long for me to realize that using NBA stats was going to be the best way for me to learn how to code.
"For one, sports has served as an entry point to data analysis for many. Sports is interesting and has great data relative to other fields, so it can teach skills and methods of thought that are then more broadly applicable. Personally, I learned how to program, a skill that has been enormously valuable to me, specifically to analyze basketball stats. And I'm far from the only story like this." -Ben Falk, Cleaning The Glass
The Project
Using the NBA to learn how to code sounds like a good start, but it it still missing a key piece to turn it into an actual project. That key piece is a goal. Tiago Forte defines a project as, “a series of tasks linked to a goal, with a deadline.”
So what is our goal? Well, for those of you that have played fantasy basketball before, you may have learned how important the draft is. Your team’s success is often times linked directly to your success in the draft. And your success in the draft is often linked to how effectively you can project player stats for that upcoming year. If you know Lebron James is going to score more fantasy points than Anthony Davis then you will want Lebron James on your fantasy team.
After blindly turning to the internet for many consecutive years to use projection models that weren’t made by the oafs at ESPN or Yahoo, it dawned on me that said models had to come from someone’s brain. My thinking from there was, “what’s stopping me from building my own projection model?”
Alas! We have our class project! We are going to build an NBA Fantasy Projection model so you can win your NBA Fantasy League! And how will we do that? By learning to code!
What Will You Learn?
This is another reminder that everything I’ve done to date has been a combination of self-teaching and learning from a friend who also happens to be a talented engineer.
For our purposes, we are going to focus on Python. I’ve been hooked on it ever since I took the class Automate the Boring Stuff with Python. It's undoubtedly a popular programming language so I think it will be beneficial for many years to come.
This class is not meant to be an introduction to programming or python, so my assumption is that you understand some basics. This class is geared more towards helping you apply Python programming to an actual project to help you better retain information while having fun within the process.
Since this class is primarily focused on data (in the form of NBA stats), we will need to manipulate the data in various ways. To help with this, we’ll use the Pandas library. Pandas is extremely powerful and can be used in more ways than just building NBA fantasy projection models so I think you will find it extremely helpful to learn more about.
In his book, Jake VanderPlas describes Pandas as, “a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame. dataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.” Said another way, Pandas is SQL and Excel on steroids!
By the end of this course you will be ready to win your NBA fantasy league by building the best fantasy projection model using Python and more specifically Pandas. All of this will be done using a Jupyter Notebook so you can share your work and improve on it over the years.