Introduction to Machine Learning (Part-9)

July 26, 2020

Regularization : Regularization of a model can be done by avoiding overfitting in the model. It adds a penalty term in the cost function based on the parameters.

Cost func θ* = arg_θmax logP(S;θ)= Σ^m_j=1logP(y^(f)|x^j;θ)-λΣⁿ_i=1β_i²

λ is a constant (Hyper-Parameter) that determine the strength of the penalty term.

For Linear Regression :

θ* = arg_θmin = Σ^m_i=1(y_i-y_i')² +λΣⁿ_j=1β_j²

Way to minimize this is by minimizing individual terms. Here, for all j = 1 , β = 0. In linear regression using L₂ penalty term Σⁿ_j=1β_j² results in Ridge regression and using L₁ penalty term Σⁿ_i=1|β_j| results in Lasso Regression. In Linear regression remove co related independent variables .

Overfitting :

Try to keep the model simple by damping β to avoid complexity.

Samples and Estimation :

Sample is the subset of Population, training data is used to estimate parameters of the model.

L(θ) = P(D|θ) = π_{x_i∈ D}P(x_i|θ)

θ' = arg_θmax Log(L(θ))

= Σ_{x_i∈ D}logP(x_i|θ) , which is more generalized form of L(θ) for Maximum Likelihood Estimate.

Beta Distribution : Two shape parameters α,β where α leads to success and β leads to failure.

f(x; α,β) = (P( α + β)/P( α )P(β))* x^α-1(1-x)^β-1

^{It allows to easily calculate the expected value for the data.}

^{Trade off Between Bias and Variance}

When Hypothesis Set increases, the variance increases and the bias decreases for the Maximum Likelihood and it leads to overfitting. As we increase the size of hypothesis set, we get the overfitting [When the test data and the training data has large gap then it leads to overfitting, here the error is variance]. Bias is when functional form is far away from the actual functional form and variance is the error that will remain, even after we choose optimal parameters for our chosen functional form.

Search This Blog

Data Science

Introduction to Machine Learning (Part-9)

Comments

Post a Comment

Popular posts from this blog

Convolutional Neural Networks(Part-2)

Supervised Learning(Part-5)

Convolutional Neural Networks(Part-3)