Sample is the representative of underlying population. Probability of a sample is a good reflection of probability of population.
Actual Probability distribution of a sample is hidden, that's why we try to predict the same by making some assumptions.Let us consider Sample as m where m maps to ∞ (infinity) where P' is the estimated probability, where it may be possible that P'(predicted probability) = P(Actual Probability) for an unbiased sample.
Variables : Variables can be categorical (Nominal) or numeric (continuous). For categorical, it can have n unique values where it's probability can be : P(x1) = θ1 , P(X2) = θ2 ,..........., P(X1k) = θ1k.
where θ = x/m where x is number of times the variables occur and m is the size of data set, which is also called Frequentistic Approximation.
If there are k parameters and let us consider there are k-1 parameters known then to get the probability of last parameter that is k is
θ1k = 1- k-1Σi=1 θ1i
For continuous : In continuous , we don't follow nominal because due to large dataset, it's probability can be assigned to Zero by x/m,also it include decimals. So to avoid this we follow this normal or gaussian distribution.
It is parametric because it is based on some parameters.
Here, P(X) = 1/(√2π)*σ[(e-((xi-mean)2)/2σ2)]
It is the hypothesis space equation of normal distribution. Here we reduced our learning task like estimate joint probability distribution. The parameters considered in this distribution are mean and σ (Standard deviation). To find mean and σ we have to search hypothesis space.
Here each dot in hypothesis space is a model which has different values of mean and standard deviation. For search in the Hypothesis space we use one of the algorithm known as Gradient Descent.
Gradient Descent starts at some random point in hypothesis space and evaluate how good the model is. Error = observed value - predicted value.
Cost function by sum of squared errors between actual and predicted value.
Cost fn = 1/m(mΣi=1 (yi-(β1xi+β0)2))
Feature Engineering : It transform original data to some feature space f1and y to make it linear and make it work.
X,Y can be transformed to √x,3√x,x2,x3,x4,......,Y .
Linear regression : y = β0 + mΣi=1 β1xi
Here , Linear regression means the equation is not linear but here xi is linear with parameter which help them to fit.
Gaussian distribution can be a mixture of components of Gaussian.
In mixture of gaussian we apply Joint Probability Distribution on various components which combine mixture of gaussians. Continuous distribution can be turned in gaussian mixture distribution. Any probability density function can be approximated by gaussian mixture model.
Maximum Likelihood Principle : It says if we have sample m , it maximizes the probability of generating sample S.
P(Data) = mπi=1(P(xi))
where π mean product.
Comments
Post a Comment