Introduction to Machine Learning (Part-6)

 Real world process is generated from Population where sample is the sub set of population. Sample can be of two type : 

  • Generative : It says if there are n variables x1,x2,.....,xn and we need to find the probability of these n variables then we apply Joint probability distribution on it as it also follows joint probability distribution then P(x) = P(x1,x2,....,xn)
  • Discriminative : It learns from the conditional probability, which can be defined as  if f : x(vector) maps to y then P(y|x(vector)), where it simply learn by reducing parameters. In hypothesis set, each Hi is not equal to Hj  because each parameter is different in this set. Here, h* = argmin C(y,y'(x(vector))) where y is actual value and y' is the predicted value.
Constraints on  which we get ability to learn are :
  • Hypothesis set chosen
  • Search algorithm
Parameters grown exponentially with respect to number of variables :
  • For binary : 2n-1
  • For non-binary : kn-1
Logistic regression is used when y is categorical whereas  linear regression is used when y is numeric. We can simply learn by reducing parameters by Independent assumptions or Parametric models.

In linear regression, we usually get noise in the measurement.
         y = mx + c + e
where m is the slope, c in the intercept and e  is the unexplained variance.

Bernoulli Experiment : In this , we have only two values , it can be 0 or 1 , yes or no etc...If there are n transaction in an experiment then we predict its value as expected value , where expected value is defined as :
if P(x =1) , then expected value = θx(1-θ)1-x
    P(D) where there are n transactions can be done as :
                          P(D) =nπi=1 P(xi)
                                  = θ#x(1-θ)#(1-x)
                                  = θr(1-θ)n-r
  
θ represents Accuracy

                  if we take log on both sides :
                log(P(D)) = r log θ + (n-r) log θ
This leads to maximum log likelihood

Binomial Distribution : It tells what is the probability of getting correct prediction of n. In this we are interested in set of outcome rather than one outcome. The probability that n Bernoulli trails leads to k states.
      P(y=k|θ)  = θk(1-θ)n-k

Mean = nq  where q = θ
Variance = nq(1-q) where q = θ

Multinomial Distribution :  In this we have categorical trials. The probability is 
 
P(y1,y2,..ykθ ) = (n!/y1!y2!...yk!) kπi=1 yiθi 

n = kΣi=1 yk



Comments

Popular posts from this blog

Supervised Learning(Part-5)

Supervised Learning(Part-2)

Convolutional Neural Networks(Part-2)