Introduction to Machine Learning (part-2)

July 13, 2020

Sample is the representative of underlying population. Probability of a sample is a good reflection of probability of population.

Actual Probability distribution of a sample is hidden, that's why we try to predict the same by making some assumptions.Let us consider Sample as m where m maps to ∞ (infinity) where P' is the estimated probability, where it may be possible that P'(predicted probability) = P(Actual Probability) for an unbiased sample.

Variables : Variables can be categorical (Nominal) or numeric (continuous). For categorical, it can have n unique values where it's probability can be : P(x₁) = θ_{1 ,}_{P(X₂) =} θ₂ ,..........., P(X_1k) = θ_1k.

_whereθ = x/m where x is number of times the variables occur and m is the size of data set, which is also called Frequentistic Approximation.

If there are k parameters and let us consider there are k-1 parameters known then to get the probability of last parameter that is k is

θ_1k = 1- ^k-1Σ_i=1 θ_1i

_{For continuous : In continuous , we don't follow nominal because due to large dataset, it's probability can be assigned to Zero by x/m,also it include decimals. So to avoid this we follow this normal or gaussian distribution.}

It is parametric because it is based on some parameters.

Here, P(X) = 1/(√2π)*σ[(e^{-((x_i-mean)²)/2σ²})]

It is the hypothesis space equation of normal distribution. Here we reduced our learning task like estimate joint probability distribution. The parameters considered in this distribution are mean and σ (Standard deviation). To find mean and σ we have to search hypothesis space.

Here each dot in hypothesis space is a model which has different values of mean and standard deviation. For search in the Hypothesis space we use one of the algorithm known as Gradient Descent.

Gradient Descent starts at some random point in hypothesis space and evaluate how good the model is. Error = observed value - predicted value.

Cost function by sum of squared errors between actual and predicted value.

Cost fn = 1/m(^mΣ_i=1(y_i-(β₁x_i+β₀)²))

Feature Engineering : It transform original data to some feature space f₁and y to make it linear and make it work.

X,Y can be transformed to √x,3√x,x²,x³,x⁴,......,Y .

Linear regression : y = β_{0 +} ^mΣ_i=1β₁x_i

_{Here , Linear regression means the equation is not linear but here}x_{i is linear with parameter which help them to fit.}

_{Gaussian distribution can be a mixture of components of Gaussian.}

_{In mixture of gaussian we apply Joint Probability Distribution on various components which combine mixture of gaussians. Continuous distribution can be turned in gaussian mixture distribution. Any probability density function can be approximated by gaussian mixture model.}

_{Maximum Likelihood Principle : It says if we have sample m , it maximizes the probability of generating sample S.}

_{P(Data) =}^mπ_i=1(P(x_i))

where π mean product.

Search This Blog

Data Science

Introduction to Machine Learning (part-2)

Comments

Post a Comment

Popular posts from this blog

Model Evaluation and Selection

Convolutional Neural Networks(Part-4)

Graph Analysis(Part-2)