Introduction to Machine Learning (Part-9)

 Regularization : Regularization of a model can be done by avoiding overfitting in the model. It adds a penalty term in the cost function based on the parameters.

Cost func  θ* = argθmax logP(S;θ)= Σmj=1logP(y(f)|xj;θ)-λΣni=1βi2

λ is a constant (Hyper-Parameter) that determine the strength of the penalty term.

For Linear Regression : 

                 θ* = argθmin =  Σmi=1(yi-yi')2 +λΣnj=1βj2

Way to minimize this is by minimizing individual terms. Here, for all j = 1 , β = 0. In linear regression using L2 penalty term Σnj=1βj2 results in Ridge regression and using L1 penalty term Σni=1j| results in Lasso Regression. In Linear regression remove co related independent variables .

Overfitting : 



Try to keep the model simple by damping β to avoid complexity.

Samples and Estimation : 

Sample is the subset of Population, training data is used to estimate parameters of the model. 

      L(θ) = P(D|θ) =  πxi∈ DP(xi|θ)

     θ' = argθmax Log(L(θ))

        = Σxi∈ D logP(xi|θ) , which is more generalized form of L(θ) for Maximum Likelihood Estimate.

Beta Distribution : Two shape parameters α,β where α leads to success and β leads to  failure. 

f(x;  α,β) = (P( α + β)/P( α )P(β))* xα-1(1-x)β-1

It allows to easily calculate the expected value for the data.

Trade off Between Bias and Variance 

When Hypothesis Set increases, the variance increases and the bias decreases for the Maximum Likelihood and it leads to overfitting. As we increase the size of hypothesis set, we get the overfitting [When the test data and the training data has large gap then it leads to overfitting, here the error is variance]. Bias is when functional form is far away from the actual functional form and variance is  the error that will remain, even after we choose optimal parameters for our chosen functional form.



Comments

Popular posts from this blog

Supervised Learning(Part-5)

Supervised Learning(Part-2)

Text Analysis (Part - 4)