Machine Learning with R

Crossvalidation of Tuning Parameters

Cross validation can be used to tune the hyperparameters to determine the optimal value for the model.

In choosing an optimal model, remember that the model containing all the predictors always has smallest RSS or largest R-squared. We will want to choose model with low test error so RSS and R-squared are not always suitable for choosing from models with different number of predictors.

There are other measures such as adjusted R-squared that take this into account. The Complexity Parameter(Cp) is another parameter that can be useful. Mallow’s Cp is an adjustment to the training RSS that will give an estimate for the test RSS.

While logistic regression seemed to perform well in a previous post, Here, we are interested in looking at classficiation trees on this model. Let’s built a CART model and choose a Cp value by cross-validation.

While the algorithm optimizes the tuning parameters, tuning parameters can also be set in R using cross-validation. Let’s use 10-fold cross-validation to tune the Cp hyperparameter. Notice that smaller Cp values are desired while we wish to maximize R-squared.

Now, we’ll define the cross-validation experiment and specify a range of Cp values for tuning.

## CART 
## 
## 18722 samples
##    25 predictors
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## 
## Summary of sample sizes: 16849, 16850, 16850, 16850, 16850, 16850, ... 
## 
## Resampling results across tuning parameters:
## 
##   cp     RMSE  Rsquared  RMSE SD  Rsquared SD
##   0.002  0.3   0.4       0.005    0.02       
##   0.004  0.3   0.4       0.005    0.03       
##   0.006  0.3   0.4       0.005    0.02       
##   0.008  0.3   0.4       0.004    0.02       
##   0.01   0.3   0.3       0.004    0.02       
##   0.01   0.3   0.3       0.004    0.02       
##   0.01   0.3   0.3       0.004    0.02       
##   0.02   0.3   0.3       0.004    0.02       
##   0.02   0.3   0.3       0.004    0.02       
##   0.02   0.3   0.3       0.004    0.02       
##   0.02   0.3   0.3       0.004    0.02       
##   0.02   0.3   0.3       0.004    0.02       
##   0.03   0.3   0.3       0.004    0.02       
##   0.03   0.3   0.3       0.004    0.02       
##   0.03   0.3   0.3       0.004    0.02       
##   0.03   0.3   0.3       0.004    0.02       
##   0.03   0.3   0.3       0.004    0.02       
##   0.04   0.3   0.3       0.004    0.02       
##   0.04   0.3   0.3       0.004    0.02       
##   0.04   0.3   0.3       0.004    0.02       
##   0.04   0.3   0.3       0.004    0.02       
##   0.04   0.3   0.3       0.004    0.02       
##   0.05   0.3   0.3       0.004    0.02       
##   0.05   0.3   0.3       0.004    0.02       
##   0.05   0.3   0.3       0.004    0.02       
##   0.05   0.3   0.3       0.004    0.02       
##   0.05   0.3   0.3       0.004    0.02       
##   0.06   0.3   0.3       0.004    0.02       
##   0.06   0.3   0.3       0.004    0.02       
##   0.06   0.3   0.3       0.004    0.02       
##   0.06   0.3   0.3       0.004    0.02       
##   0.06   0.3   0.3       0.004    0.02       
##   0.07   0.3   0.3       0.004    0.02       
##   0.07   0.3   0.3       0.004    0.02       
##   0.07   0.3   0.3       0.004    0.02       
##   0.07   0.3   0.3       0.004    0.02       
##   0.07   0.3   0.3       0.004    0.02       
##   0.08   0.3   0.3       0.009    0.04       
##   0.08   0.3   0.3       0.008    0.02       
##   0.08   0.3   0.3       0.006    0.02       
##   0.08   0.3   0.3       0.004    0.009      
##   0.08   0.3   0.3       0.004    0.009      
##   0.09   0.3   0.3       0.004    0.009      
##   0.09   0.3   0.3       0.004    0.009      
##   0.09   0.3   0.3       0.004    0.009      
##   0.09   0.3   0.3       0.004    0.009      
##   0.09   0.3   0.3       0.004    0.009      
##   0.1    0.3   0.3       0.004    0.009      
##   0.1    0.3   0.3       0.004    0.009      
##   0.1    0.3   0.3       0.004    0.009      
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final value used for the model was cp = 0.002.

Build the CART model and calculate the prediction accuracy.

##    PredictCARTcpcv
##        0    1
##   0  742  726
##   1  456 6100

Plot the tree. CART models are more interpretable than logistic regression models and in addition the accuracy of 85% is comprable to the accuracy from the logistic regression model.

plot of chunk unnamed-chunk-4