Posts

Showing posts from September, 2020

Supervised Learning(Part-6)

Image
  K-nearest Neighbor K Nearest Neighbors - Classification K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique.    Algorithm A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.  It should also be noted that all three distance measures are only valid for continuous variables. In the instance of categorical variables the Hamming distance must be used. It also brings up the issue of standardization of the numerical variables between 0 and 1 when there is a mixture of numerical and categorical variables in the dataset. Choosing the optimal value for K is

Supervised Learning(Part-5)

Image
  Decision Trees (Continued) Attribute Selection Measures If the dataset consists of N attributes then deciding which attribute to place at the root or at different levels of the tree as internal nodes is a complicated step. By just randomly selecting any node to be the root can’t solve the issue. If we follow a random approach, it may give us bad results with low accuracy. For solving this attribute selection problem, researchers worked and devised some solutions. They suggested using some criteria like : Entropy, Information gain, Gini index, Gain Ratio, Reduction in Variance Chi-Square These criteria will calculate values for every attribute. The values are sorted, and attributes are placed in the tree by following the order i.e, the attribute with a high value(in case of information gain) is placed at the root. While using Information Gain as a criterion, we assume attributes to be categorical, and for the Gini index, attributes are assumed to be continuous. Entropy Entropy is a me

Supervised Learning(Part-4)

Image
   Decision Trees In the Machine Learning world,  Decision Trees  are a kind of  non parametric models,  that can be used for both classification and regression. This means that Decision trees are   flexible models that don’t increase their number of parameters as we add more features (if we build them correctly), and they can either output a  categorical  prediction ( like if a plant is of a certain kind or not ) or a  numerical  prediction ( like the price of a house ). They are constructed using two kinds of elements:  nodes and branches . At each node, one of the features of our data is evaluated in order to split the observations in the training process or to make an specific data point follow a certain path when making a prediction. When they are being built decision trees are constructed by  recursively  evaluating different features and using at each node the feature that best splits the data. This will be explained in detail later. Probably the best way to start the explanatio

Supervised Learning(Part-3)

Image
  Naive Bayes A classifier is a machine learning model that is used to discriminate different objects based on certain features. Principle of Naive Bayes Classifier: A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem. Bayes Theorem: Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is that the predictors/features are independent. That is presence of one particular feature does not affect the other. Hence it is called naive. Example: Consider the problem of playing golf.  We classify whether the day is suitable for playing golf, given the features of the day. The columns represent these features and the rows represent individual entries. If we take the first row of the dataset, we can observe that is not suitable for playing golf if the outlook is rainy, temperatur

Supervised Learning(Part-2)

Image
  Support Vector Machine A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.  In two dimensional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side. Suppose given plot of two label classes on graph as shown in image. You might have come up with something similar to following image (image B). It fairly separates the two classes. Any point that is left of line falls into black circle class and on right falls into blue square   In real world application, finding perfect class for millions of training data set takes lot of time. This is called regularization parameter.  Two terms regularization parameter and gamma are tuning parameters in SVM classifier. Varying those we can achieve considerable non linear classification line with more accuracy in reasonable amount of

Supervised Learning(Part-1)

Image
  Supervised Learning Logistic Regression   It  is used when the dependent variable(target) is categorical. For example, To predict whether an email is spam (1) or (0) Whether the tumor is malignant (1) or not (0) Simple Logistic Regression Model Output = 0 or 1 Hypothesis => Z = WX + B hΘ(x) = sigmoid (Z) Sigmoid Function If ‘Z’ goes to infinity, Y(predicted) will become 1 and if ‘Z’ goes to negative infinity, Y(predicted) will become 0. Types of Logistic Regression 1. Binary Logistic Regression:  The categorical response has only two 2 possible outcomes. Example: Spam or Not 2. Multinomial Logistic Regression:  Three or more categories without ordering. Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan) 3. Ordinal Logistic Regression:  Three or more categories with ordering. Example: Movie rating from 1 to 5 Decision Boundary To predict which class a data belongs, a threshold can be set. Based upon this threshold, the obtained estimated probability is classifie