Model Selection and Evaluation (Part-2)

 Heuristic Search : It can be defined as , the hypothesis space is infinite. Hypothesis testing leads to the evaluation of model which can be done by two methods :

  • Chi - Square Method : Here, x2    =  Σ(Oij - Eij) / Eij   where, x2 is degree of freedom ,Oij is observed value and Eij is expected value.
  • t-test : Here, we find the confidence interval.
Machine Learning model is representation of estimate  of  population.

SMOTE : It makes synthetic sample. It says that draw a line of the particular instance to its closest neighbor instance of the sample class which is under-represented in data.
Here, we make model to learn decision boundary that is connecting two instances and it gives better generalization. Transformation can be done after SMOTE.

Oversampling and under-sampling can only be done on training data, never on test data because on training data we can learn function to predict the low frequency class. On test data, we estimate or access the accuracy of the model on the population where population is skewed distribution.

Synthetic data forces generalization. In this, we can do sampling with replacement whereas in oversampling we make copies of the instance. Therefore, SMOTE is the better approach to be used.m

Comments

Popular posts from this blog

Supervised Learning(Part-5)

Supervised Learning(Part-2)

Text Analysis (Part - 4)