Model Evaluation and Selection(Part-3)

Hypothesis Testing

Given a hypothesis, test if it is true with a particular confidence
• Hypotheses: errorD (h1) > errorD (h2)
• What % of the probability mass is associated with errorD (h1) - errorD (h2) > 0
• Example
• Let the error rates measured for the two hypotheses h1 and h2 on a sample of size 100 be 0.3 and 0.2 respectively
• The standard deviation for the normal distribution defined on d’ = errorS1 (h1 )- errorS2 (h2 ) is • 1.640.061 = 0.1
• 1.64 standard deviation corresponds to the 90% confidence interval and hence the % mass of the probability distribution > 0 is 95%
• Result: Accept the hypothesis with 95% confidence that h2 is a more accurate hypothesis than h1 on D (the underlying population)

Comparing Two Algorithms

Given two learning algorithms, L1 and L2 , which one is better, on average, at learning a particular target function
• Estimating the relative performance
• Calculate the Expected Value of the difference in errors
• For all samples of size n, consider the relative performance of L1 and L2
• Estimate the error over all training samples S of D ESD [errorD (L1 (S))- errorD (L2 (S))]
• where L1 (S) and L2 (S) are the hypotheses generated by the learning algorithms from the sample S
• Can be estimated by
• Holding back part of the training data for testing
• Using multiple samples of the training data D
• Cross-validation
• Bootstrapping

Cross-validation:
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.

Bootstraping
Sample, with replacement, the training data (of size N) and use the sample (of size N) to learn the model
• The same instance in the training data can appear multiple times in the data used for learning the model
• An instance within the training data may not appear at all in the data used for learning
• Use remaining data as test data
• Repeat multiple times to obtain an error estimate and a confidence interval.

Confusion Matrix

• Contingency table containing information regarding the actual and predicted value of the class labels

Comments