Introduction

This post is the second post in the series on using various Supervised Learning techniques to classify NBA All-Stars. In the previous post, I introduced the problem, the data and the general approach I took for each algorithm. In this post, I will discuss how [Decision Trees]((https://en.wikipedia.org/wiki/Decision_tree) performed on this problem.

Decision Trees

Decision Trees (DTs) classify samples by inferring simple decision rules from the data features. They build a tree structure where each internal node represents a if/then rule based on some attribute. Classification of new samples is done by traversing the tree and using the leaf node values. Some advantages of DTs are that they are easy to understand, implement and visualize. One big disadvantage of DTs is that they are prone to overfitting on the training data; overfitting occurs when the algorithm builds a tree that performs really well on the training data, capturing all the noise and unwanted features. For my experiments, I avoided overfitting by pruning (specifying the max_depth) the tree. In scikit-learn’s DecisionTreeClassifier class, I used the Gini Impurity as the function to measure the quality of a split and specified a balanced class weighting measure. All other parameters used the defaults provided by scikit-learn.

3-level Decision Tree for NBA Data

Model Complexity Curve

The figure above shows the final 3-level DT generated for the dataset I supplied. I was surprised to see that only 5 attributes of the 50 were considered in the tree: PER (Player Efficiency Rating). FTA (Free Throws Attempted), PPG (Points Per Game), 2P (Two Pointers Made), DWS (Defensive Win Shares). I decided on using a tree with only 3 levels after generating the Model Complexity Curve. See figure below for those results. The results in that graph demonstrate that using a tree of max_depth of 3 generated the highest cross validation score. As we increase the depth of the tree, we see that the training score continue to improve while the validation score declines. This demonstrates that adding more levels to the tree leads to overfitting on the training data, which causes the tree to not generalize well on unseen data.

Model Complexity Curve

Learning Curve

Next, I ran a Learning Curve experiment on the 3-level Decision Tree. The results are captured in the figure below. This graph shows that both the training and cross-validation scores converge to around the same value: 90-92% accuracy. This demonstrates that my Decision Tree model is ideal, in that it does not suffer from high bias or variance. Instead, this model will generalize well to previously-unseen testing data which is confirmed in the final testing results. The final training accuracy was 92.4931% and testing accuracy was 89.2402%. Additionally, this model required only about 3000 samples to achieve an accuracy of around 90%, which shows that this model generalizes well without requiring too much more data.

Learning Curve

Timing Curve

The last figure below shows the timing curves for training vs prediction. Decision Trees are an eager learning model, which means that the algorithm spends a linear amount of time learning from the model and returns classifications in constant time. This is evident in the figure, which shows training time increasing linearly with the amount of data, while the classification time remaining constant.

Timing Curve

Overall, Decision Trees performed well for this problem. The final testing accuracy was higher than I expected. In the conclusion post, I will apply this model to the stats from 2018 to see how well the model does on recent data.

This concludes my investigation of Decision Trees to classify NBA All-Stars. The next post will cover k-Nearest Neighbors and its performance on the same problem.

NOTE: This project was completed as part of my Masters coursework. Specifically, for the Machine Learning course.