The chapter is 28 pages long and covers the following topics:
- Resampling Methods
- 5.1 Cross-Validation
- 5.1.1 The Validation Set Approach
- 5.1.2 Leave-One-Out Cross-Validation
- 5.1.3 k-Fold Cross-Validation
- 5.1.4 Bias-Variance Trade-Off for k-Fold Cross-Validation
- 5.1.5 Cross-Validation on Classification Problems
- 5.2 The Bootstrap
- 5.3 Lab: Cross-Validation and the Bootstrap
- 5.3.1 The Validation Set Approach
- 5.3.2 Cross-Validation
- 5.3.3 The Bootstrap
- 5.4 Exercises
- 5.1 Cross-Validation
Resampling methods are indispensable tools for data scientists. Cross-validation and the bootstrap empower us to evaluate model performance, estimate uncertainties, and ultimately make more informed decisions based on our statistical learning models.
“The bootstrap is a flexible and powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method.”
“LOOCV sometimes useful, but typically doesn’t shake up the data enough. The estimates from each fold are highly correlated and hence their average can have high variance.”
“In cross-validation, each of the K validation folds is distinct from the other K − 1 folds used for training: there is no overlap. This is crucial for its success.”
- Model assessment
- the process of evaluating a model’s performance.
- Model selection
- the process of selecting the proper level of flexibility for a model.
- Test error rate
- the average error that results from using a ml method to predict the response on a new observation— that is, a measurement that was not used in training the method.
- Training error rate
- the error calculated by applying the ml method to the observations used in its training. This is often dramatically less then the test error rate.
- Cross-validation
- a resampling method used to estimate the test error associated with a given ml method to evaluate its performance or to select the appropriate level of flexibility.
- Validation set approach
- a method for estimating the test error associated with fitting a particular ml method on a set of observations that involves randomly dividing the available set of observations into two parts, a training set and a validation set.
- Validation set error rate
- typically assessed using MSE in the case of a quantitative response, provides an estimate of the test error rate.
- Leave-one-out cross-validation (LOOCV)
- a method that attempts to address the validation set approach’s drawbacks of high variability and overestimation of the test error rate; involves splitting the set of observations into two parts, but instead of creating two subsets of comparable size, a single observation is used for the validation set, and the remaining observations make up the training set.
- Resampling methods
- involve repeatedly drawing samples from a training set and refitting a model on each sample to get more information about the fitted model.
- Bootstrap
- a resampling method used to provide a measure of accuracy of a parameter estimate or of a given statistical learning method.
- Hold-out set
- the validation set.
- k-fold CV
- an alternative to LOOCV that involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k - 1 folds.
rethinking test train and validation sets
The definition I took away from this course were confused. I want to clarify things a bit here. In the ML world, we have do a three way split as follows.
- training set - used to learn the parameters.
- test set - used to evaluate the model capacity to generalize to new data.
- validation set - used to tune the hyperparameters of the model.
Once we are finished tuning the hyperparameters we will train the model on all the data using our best hyperparameters and use that for production.
In statistical learning we learned about a number of approaches to do the same thing. The validation set approach, LOOCV, and k-fold CV.
The validation set approach is the simplest but can suffer from if the split underrepresented classes of the data.
LOOCV is computationally expensive for any but the smallest datasets - unless you use a clever trick to estimate it using a single training run. Using such tricks is quite common for work for bayesian hierarchical models. However as the chapter suggests there is also problem of correlation in LOOCV that can lead to high variance in the test error estimate.
k-fold CV is the most common approach and is used in most ML libraries. It is a good balance between computational cost and accuracy. However it is still useful to consider how to pick the k-folds so that they faithfully represent all the classes in the data.
Outline of Resampling Methods
In this chapter we focus on resampling methods, specifically focusing on cross-validation and the bootstrap.
Central Theme: Resampling methods offer robust techniques to estimate the performance of statistical learning methods and quantify uncertainty associated with estimations. This is achieved by repeatedly drawing samples from the training data and refitting models on these samples.
Key Concepts and Methods
- Validation Set Approach: This approach involves splitting the dataset into a training set and a validation set. The model is trained on the training set, and its performance (e.g., using MSE for regression problems) is evaluated on the validation set.
- Drawbacks:
- High variability in test error estimates depending on the random split.
- Reduced sample size for training as only the training set is used.
- Example: Using the Auto dataset, the sources demonstrate that a quadratic fit performs better than a linear fit for predicting ‘mpg’ based on ‘horsepower’.
- Cross-Validation: addresses the limitations of the validation set approach. It involves dividing the data into ‘K’ folds and iteratively using one fold for validation and the rest for training. This provides more stable performance estimates.
- Leave-One-Out Cross-Validation (LOOCV): A special case where K equals the number of observations. Each observation is held out once for validation.
- Benefits: Less bias in test error estimates.
- Drawbacks: High variance due to high correlation between the training sets.
- k-Fold Cross-Validation: A more general approach where K is typically 5 or 10, balancing bias and variance.
- Example: Using the Auto dataset, k-fold cross-validation confirms the superiority of the quadratic fit over the linear fit.
- Leave-One-Out Cross-Validation (LOOCV): A special case where K equals the number of observations. Each observation is held out once for validation.
- The Bootstrap: is used to quantify uncertainty in an estimator. It involves generating multiple ‘bootstrap’ datasets by sampling with replacement from the original data. The statistic of interest is calculated for each bootstrap dataset, creating a distribution from which standard errors and confidence intervals can be derived.
- Example: Simulating investment returns for assets X and Y, the sources illustrate how the bootstrap can be used to estimate the standard error of the optimal investment allocation (α).
- Bootstrap Applications:
- Estimating Standard Errors: The bootstrap provides an alternative way to calculate standard errors for complex estimators where analytical formulas may not be available.
- Confidence Intervals: Approximate confidence intervals can be derived from the distribution of bootstrap estimates.
- Prediction Error Estimation: While the bootstrap is mainly used for standard error and confidence interval calculations, it’s not ideal for directly estimating prediction error due to the overlap in training data between bootstrap samples.
Important Considerations
- Cross-validation for classification: When dealing with classification problems, the metric used for evaluating performance is typically the misclassification rate instead of MSE.
- Choosing K: The choice of K in k-fold cross-validation involves a bias-variance trade-off. K = 5 or K = 10 generally provides a good balance.
- Pre-validation: This technique is specifically designed for comparing adaptively derived predictors with fixed predictors by creating a ‘fairer’ version of the adaptive predictor that hasn’t ‘seen’ the response variable.
- Bootstrap Sampling: The way bootstrap samples are generated needs careful consideration depending on the data structure. For instance, in time series data, blocks of consecutive observations are sampled instead of individual observations to preserve the temporal correlation.
Conclusion
Resampling methods are indispensable tools for data scientists. Cross-validation and the bootstrap empower us to evaluate model performance, estimate uncertainties, and ultimately make more informed decisions based on our statistical learning models.
Slides
The chapter
Reuse
Citation
@online{bochman2024,
author = {Bochman, Oren},
title = {Chapter 5: {Resampling} {Methods}},
date = {2024-07-10},
url = {https://orenbochman.github.io/notes-islr/posts/ch05/},
langid = {en}
}