Supervised Learning(Part-2)

 Support Vector Machine


A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples. 
In two dimensional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

Suppose given plot of two label classes on graph as shown in image.


You might have come up with something similar to following image (image B). It fairly separates the two classes. Any point that is left of line falls into black circle class and on right falls into blue square  

In real world application, finding perfect class for millions of training data set takes lot of time. This is called regularization parameter. 
Two terms regularization parameter and gamma are tuning parameters in SVM classifier. Varying those we can achieve considerable non linear classification line with more accuracy in reasonable amount of time. 

One more parameter is kernel. It defines whether we want a linear of linear separation. 

Tuning parameters: Kernel, Regularization, Gamma and Margin.

Kernel: The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role.

For linear kernel the equation for prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows:

f(x) = B(0) + sum(ai * (x,xi))

This is an equation that involves calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm.

The polynomial kernel can be written as 

K(x,xi) = 1 + sum(x * xi)^d 
and exponential Kernal as 

K(x,xi) = exp(-gamma * sum((x — xi²))


Polynomial and exponential kernels calculates separation line in higher dimension. This is called kernel trick

Regularization

The Regularization parameter (often termed as C parameter in python’s sklearn library) tells the SVM optimization how much you want to avoid misclassifying each training example.

For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. 
Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points.

The images below are example of two different regularization parameter. Left one has some misclassification due to lower regularization value. Higher value leads to results like right one.
Low Regularization Value

High Regularization value


Gamma

The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible separation line are considered in calculation for the separation line. Where as high gamma means the points close to plausible line are considered in calculation.
High Gamma

Low Gamma


Margin

And finally last but very important characteristic of SVM classifier. SVM to core tries to achieve a good margin.

A margin is a separation of line to the closest class points.

A good margin is one where this separation is larger for both the classes. Images below gives to visual example of good and bad margin. A good margin allows the points to be in their respective classes without crossing to other class.



Comments

Popular posts from this blog

Supervised Learning(Part-5)

Convolutional Neural Networks(Part-2)