Introduction to Machine Learning (Part-6)

July 21, 2020

Real world process is generated from Population where sample is the sub set of population. Sample can be of two type :

Generative : It says if there are n variables x₁,x₂,.....,x_n and we need to find the probability of these n variables then we apply Joint probability distribution on it as it also follows joint probability distribution then P(x) = P(x₁,x₂,....,x_n)
Discriminative : It learns from the conditional probability, which can be defined as if f : x(vector) maps to y then P(y|x(vector)), where it simply learn by reducing parameters. In hypothesis set, each H_i is not equal to H_j because each parameter is different in this set. Here, h* = argmin C(y,y'(x(vector))) where y is actual value and y' is the predicted value.

Constraints on which we get ability to learn are :

Hypothesis set chosen
Search algorithm

Parameters grown exponentially with respect to number of variables :

For binary : 2ⁿ-1
For non-binary : kⁿ-1

Logistic regression is used when y is categorical whereas linear regression is used when y is numeric. We can simply learn by reducing parameters by Independent assumptions or Parametric models.

In linear regression, we usually get noise in the measurement.

y = mx + c + e

where m is the slope, c in the intercept and e is the unexplained variance.

Bernoulli Experiment : In this , we have only two values , it can be 0 or 1 , yes or no etc...If there are n transaction in an experiment then we predict its value as expected value , where expected value is defined as :

if P(x =1) , then expected value = θ^x(1-θ)^1-x

P(D) where there are n transactions can be done as :

P(D) =ⁿπ_i=1 P(x_i)

= θ^#x(1-θ)^#(1-x)

= θ^r(1-θ)^n-r

θ represents Accuracy

if we take log on both sides :

log(P(D)) = r log θ + (n-r) log θ

This leads to maximum log likelihood

Binomial Distribution : It tells what is the probability of getting correct prediction of n. In this we are interested in set of outcome rather than one outcome. The probability that n Bernoulli trails leads to k states.

P(y=k|θ) = θ^k(1-θ)^n-k

Mean = nq where q = θ

Variance = nq(1-q) where q = θ

Multinomial Distribution : In this we have categorical trials. The probability is

P(y₁,y₂,..y_k | θ ) = (n!/y₁!y₂!...y_k!) ^kπ_i=1 ^y_iθ_i

n = ^kΣ_i=1 y^k

Search This Blog

Data Science

Introduction to Machine Learning (Part-6)

Comments

Post a Comment

Popular posts from this blog

Model Evaluation and Selection

Convolutional Neural Networks(Part-4)

Introduction to Machine Learning(Part-4)