This post is the fifth post in the series on using various Supervised Learning techniques to classify NBA All-Stars. In a previous post, I introduced the problem, the data and the general approach I took for each algorithm. In this post, I will discuss how Neural Netowkrs performed on this problem.
Artificial Neural Networks (ANN) learn a nonlinear function using three types of connected nodes: input, hidden and output. These nodes are connected in a manner similar to neurons in the human body. The algorithm takes input features in the leftmost input layer, each neuron in the hidden layer(s) transforms the output of the previous layer, then the hidden layer computes a weighted linear summation and applies a non-linear activation function. Lastly, the output layer gets the values from the last hidden layer and transforms it into output values. These output values are compared against the actual known output value and the error value is bubbled back through the layers so subsequent iterations can adjust the weights to continuously reduce error.
In scikit-learn, I used the MLPClassifier (Multilayer Perceptron), which uses the Backpropagation algorithm described above. I experimented with changing the alpha hyperparameter. I used the default hidden layer configuration of one hidden layer with 100 units for the NBA dataset, since my classification problem was not too complex. alpha is a penalty term that constrains the size of the weights. Increasing alpha can reduce high variance in the model, which results in fewer curves in the decision boundaries. Decreasing alpha can help reduce high bias in the model, resulting in a more complex, curvier decision boundary.
Model Complexity Curve
See Figure below for the Model Complexity curve for varying values of alpha for the NBA problem. As alpha increases, the accuracy scores remain consistent, but after alpha is greater than 0.1 the accuracy declines, which shows that using an alpha greater than 0.1 will result in overfitting.
For the learning curve, I used alpha=10^-5 as the hyperparameter, since having a less complex model is preferable to a more complex one (Occam’s Razor). The learning curve shown below exhibits high variance (since the validation accuracy is increasing with more data). This suggests that either giving this model more data or reducing the number of features might improve accuracy. The model also shows high bias, since the accuracy scores are relatively low. One option to reduce the high bias might be to get new features from another source that might help improve accuracy. The final training accuracy was 84.5597% and testing accuracy was 76.8251%. This model showed similar accuracy scores as KNN but performed worse than decision trees or boosting.
Timing and Iteration Curves
The figure below shows the iteration and timing curves for Neural Networks. Here we see at we need at least 500 iterations of the algorithm to be able to classify data with a better accuracy than 50% for the NBA problem. We also see that since Neural Networks are a eager learner, more time is spent in training (O(n)) than during prediction (O(1)).
Overall, Neural Networks did not perform too well for this problem. The final testing accuracy was lower than for Decision Trees. The model I chose had the default number of 100 hidden layers. In future experiments, I plan to modify the structure of the network to elicit better accuracy.
This concludes my investigation of Boosting to classify NBA All-Stars. The next post will cover Support Vector Machines and their performance on the same problem.
NOTE: This project was completed as part of my Masters coursework. Specifically, for the Machine Learning course.