This post is the fourth post in the series on using various Supervised Learning techniques to classify NBA All-Stars. In a previous post, I introduced the problem, the data and the general approach I took for each algorithm. In this post, I will discuss how Boosted Decision Trees performed on this problem.
Boosting is an ensemble learning technique that combines the predictions of several weak learners to produce a generalized prediction. Weak learners are models whose predictions are only slightly better than choosing at random. The final prediction is usually done by a weighted majority vote of all estimators. For my experiments, I used a specific boosting algorithm, AdaBoost. AdaBoost iteratively applies learning process to a distribution of weights. At each step, the training examples that were classified incorrectly are weighted higher for the next iteration and the examples that were classified correctly are weighted lower. For my experiments, I used a single-level Decision Tree (also known as a decision stump) as the base estimator for AdaBoost. I also experimented with various number of estimators to identify the best estimator count for my model. One interesting observation I had was that using a Decision Tree of greater depths (3 or greater) caused my boosting algorithm to underperform. After some research and reasoning, I discovered that the larger, more complex underlying decision tree was overfitting, causing boosting to also overfit. Using a less complex base estimator, one that was less likely to overfit, improved the performance of the boosted model.
Model Complexity Curve
See Figure below for the Model Complexity curve for Boosting. Here, we see that varying the
hyperparameter caused the training curve to continuously increase while keeping the validation curve relatively steady. This graph shows that overfitting is likely as the number of estimators grows. I chose 18 as the optimal number of estimators to use with the base estimator of a 1-level decision tree.
In the Learning Curve plot below, we also see the that the training and validation scores converge to similar values as the size of the training set increases. This shows that our model has low bias and low variance so it generalizes well to new data. The final training accuracy was 92.3338% and testing accuracy was 87.7099%. While it performed better than k-NN, the Boosting model had around the same accuracy score as a 3-level Decision Tree. This could be because of outliers (All-Stars that didn’t have good stats but were voted in due to popularity) in the data of which Boosting is maximizing the importance.
The last figure below shows the timing curves for training vs prediction. The timing curve shows that Boosting is a eager learner since prediction takes constant time and training takes is O(n)
Overall, Boosting performed well for this problem. The final testing accuracy was around the same as for Decision Trees. Boosting was considerably slower than Decision Trees without providing significant gains in accuracy. This lead me to conclude that regular Decision Trees were sufficient for this problem and that boosting was not strictly necessary.
This concludes my investigation of Boosting to classify NBA All-Stars. The next post will cover Artificial Neural Networks and their performance on the same problem.
NOTE: This project was completed as part of my Masters coursework. Specifically, for the Machine Learning course.