Introduction to Machine Learning (Part-7)

July 22, 2020

Linear Regression : In Linear regression, we are bound with the straight line which is our decision boundary. The value of y can be same for different values of x. Can be represented as :

According to Gaussian :

P(y|x=x₁) = (1/(√2π)*σ)* e^{-(y-mean)²/2σ²}

^{^{Through linear regression equation we define y_i' = β₁x₁ + β₀}}

We are predicting value of y as y_i' = β₁x₁ + β₀+ e where e is the error or noise.

Let us assume σ for every value of x is same and β₁ = 0 then,

β₀*argmin C = Σⁿ_i=1(y_i -β₀)²

^{By differentiating on both sides}

^{dC/dβ₀ = Σⁿ_i=1(0 + 2β₀ - 2y_i) = 0}

2Σⁿ_i=1β₀- 2Σⁿ_i=1y_i = 0

nβ₀ - Σy_i = 0

β₀ = Σy_i/n

Here it comes out to be the best value for β₀ which is the mean average of y values observed in the data.

Now comes the question , How can we determine the Goodfit of our linear regression Model??

We can check the good fit of the linear regression model by the R²matrix which generalize Linear regression.

R² = 1 -Σⁿ_i=1(y_i - (Σβ_jx_j + β₀))² / Σⁿ_i=1(y_i - mean)²

Where β_jx_j is independent variable.

The good fit can be done by matrices. The more the value of R close to 1 , it means independent variable has sizable impact on predictions of output. We don't want β_j to be dependent on the range or domain of the variable.For the solution of the same we do normalization of variables beforehand. Normalization can be done for this using Min-Max Sealing . For that we define variable Z.

Z = x - x_min/x_max - x_min

Now for the actual value of y_i,

y_i = β₀ + Σ^m_i=1β_jx_j + e

Therefore y_i = y_i' + e

Cost function (C) = Σⁿ_i=1(y_i - (Σβ_jx_j + β₀))²

= y_i - (Σβ_jx_j + β₀) - e

Search This Blog

Data Science

Introduction to Machine Learning (Part-7)

Comments

Post a Comment

Popular posts from this blog

Model Evaluation and Selection

Convolutional Neural Networks(Part-4)

Graph Analysis(Part-2)