Data Science

Posts

Showing posts from July, 2020

Model Evaluation and Selection(Part-3)

July 30, 2020

Hypothesis Testing Given a hypothesis, test if it is true with a particular confidence • Hypotheses: errorD (h1) > errorD (h2) • What % of the probability mass is associated with errorD (h1) - errorD (h2) > 0 • Example • Let the error rates measured for the two hypotheses h1 and h2 on a sample of size 100 be 0.3 and 0.2 respectively • The standard deviation for the normal distribution defined on d’ = errorS1 (h1 )- errorS2 (h2 ) is • 1.640.061 = 0.1 • 1.64 standard deviation corresponds to the 90% confidence interval and hence the % mass of the probability distribution > 0 is 95% • Result: Accept the hypothesis with 95% confidence that h2 is a more accurate hypothesis than h1 on D (the underlying population) Comparing Two Algorithms Given two learning algorithms, L1 and L2 , which one is better, on average, at learning a particular target function • Estimating the relative performance • Calculate the Expected Value o...

Model Evaluation and Selection

July 28, 2020

Model Process : In decision maker process, we follow f : x → y and Real world process (RWP) tends to the population. After building a model, we need to know, how accurate the model is expected to be on population. For that, here comes the Confusion Matrix Here we are getting 88 out of 100 as correct, therefore the Accuracy = 88/100 = 0.88 we need to check whether the accuracy will be fine for model, for that we check Expected Accuracy. E[x] = Σ x P(x)X h(x(vector)) = y (Actual) Goal to be achieved here is h(x(vector)) = f(x(vector)) for each x belongs to population. h(x(vector)) = y , E [ h(x(vector)) = f (x(vector)) ] for each x belongs...

Model Selection and Evaluation (Part-2)

July 28, 2020

Heuristic Search : It can be defined as , the hypothesis space is infinite. Hypothesis testing leads to the evaluation of model which can be done by two methods : Chi - Square Method : Here, x 2 = Σ(O ij - E ij ) / E ij where, x 2 is degree of freedom ,O ij is observed value and E ij is expected value. t-test : Here, we find the confidence interval. Machine Learning model is representation of estimate of population. SMOTE : It makes synthetic sample. It says that draw a line of the particular instance to its closest neighbor instance of the sample class which is under-represented in data. Here, we make model to learn decision boundary that is connecting two instances and it gives better generalization. Transformation can be done after SMOTE. Oversampling and under-sampling can only be done on training data, never on test data because on training data we can learn function t...

Introduction to Machine Learning (Part-9)

July 26, 2020

Regularization : Regularization of a model can be done by avoiding overfitting in the model. It adds a penalty term in the cost function based on the parameters. Cost func θ* = arg θ max logP(S;θ)= Σ m j=1 logP(y (f) |x j ;θ)-λΣ n i=1 β i 2 λ is a constant (Hyper-Parameter) that determine the strength of the penalty term. For Linear Regression : θ* = arg θ min = Σ m i=1 (y i -y i ') 2 +λΣ n j=1 β j 2 Way to minimize this is by minimizing individual terms. Here, for all j = 1 , β = 0. In linear regression using L 2 penalty term Σ n j=1 β j 2 results in Ridge regression and using L 1 penalty term Σ n i=1 |β j | results in Lasso Regression. In Linear regression remove co related independent variables . Overfitting : Try to keep the model simple by damping β to avoid complexity. Samples and Estimation : Sample is the subset of Population, training data is used to...

Probability (Part-3)

July 23, 2020

Probabilistic Inference : The computation from observed evidence, of posterior probabilities for query Propositions. It leads to the Joint Probability Distribution. If P(A|B) = P(B|A)*P(A) /P(B) so here P(B|A) is the evidence , P(A) is the prior probability and P(B) is the posterior Probability. General Inference Procedure : Let X be the query value which is a dependent variable. Let E be the evidence variables and e be the observed values and is specific for them. Let Y be the unobserved Variables. P(X|e) = α P(X,e) = α Σ y P(X,e,Y) where α is the normalization constant. Bayesian Belief Network : It is a probabilistic Graphical Model. It follows casuality, means there will be the reason for something.Way to reduce its parameters is by making sum of some independent variables. Product Rule : Applicable when there are two variables pres...

Introduction to Machine Learning (Part-7)

July 22, 2020

Linear Regression : In Linear regression, we are bound with the straight line which is our decision boundary. The value of y can be same for different values of x. Can be represented as : According to Gaussian : P(y|x=x 1 ) = (1/(√2π)*σ)* e -(y-mean) 2 /2σ 2 Through linear regression equation we define y i ' = β 1 x 1 + β 0 We are predicting value of y as y i ' = β 1 x 1 + β 0 + e where e is the error or noise. Let us assume σ for every value of x is same and β 1 = 0 then, β 0 *argmin C = Σ n i=1 (y i -β 0 ) 2 By differentiating on both sides dC/dβ 0 = Σ n i=...

Introduction to Machine Learning (Part-6)

July 21, 2020

Real world process is generated from Population where sample is the sub set of population. Sample can be of two type : Generative : It says if there are n variables x 1 ,x 2 ,.....,x n and we need to find the probability of these n variables then we apply Joint probability distribution on it as it also follows joint probability distribution then P(x) = P(x 1 ,x 2 ,....,x n ) Discriminative : It learns from the conditional probability, which can be defined as if f : x(vector) maps to y then P(y|x(vector)), where it simply learn by reducing parameters. In hypothesis set, each H i is not equal to H j because each parameter is different in this set. Here, h* = argmin C(y,y'(x(vector))) where y is actual value and y' is the predicted value. Constraints on which we get ability to learn are : Hypothesis set chosen Search algorithm Parameters grown exponentially with respect to number of variables : For binary : 2 n -1 For non-binary : k n -1 Logistic regressio...

Introduction to Machine Learning (part-5)

July 16, 2020

Data Mining : It defined as a process used to extract usable data from a larger set of any raw data . It implies analyzing data patterns in large batches of data using one or more software. Crisp-dm known for Cross-Industry Standard process for Data Mining. Aim is to develop a tool and application neutral process for conducting data mining and define tasks, outputs from these tasks, terminology and mining problem type characterization. It has four level of abstraction : Phases • Example: Data Preparation Generic Tasks • A Stable, general and complete set of tasks • Example: Data Cleaning Specialized Task ...

Introduction to Machine Learning(Part-4)

July 14, 2020

Training Data is the subset of population, where population can be of n variables x 1 ,x 2 ,........,x n . Let us discuss it by example , assume we are taking to factors to differentiate individuals that is Eye color and Hair color. Eye color can be Blue,Green and Brown. Hair color can be Black, red, blond and grey. So the total combinations we will be working be 3*4 = 12 Here, we can make prediction of function by Joint Probability Distribution.We assume that eye color and hair color does not tell us much about the individuals.Let us add an attribute Milk allergy which has binary values Yes(y) or No(N), now the combinations on which we can work will be 3*4*2 = 24. If we add an another attribute income which is continuous then the combinations to learn is 3*4*2* ∞ = ∞. So , there are two ways by which we can learn this problem: Discretize / Bucket : In this we divide the attribute income in interval like: k maps to [1,100k] ,[100k,500k,.......] Probability Density Func...

Introduction to Machine Learning (Part-3)

July 14, 2020

Probability is a way to model uncertain events and Machine Learning is trying to make a process similar to real world process. It is a basic model using machine learning. It says if we have some missing data then through function we can retrieve the missing data. Example : Apply for a bank loan is a real world process, it can be based on Age, income, Education, Marriage status, Defaulted ...etc. Here we have to apply Joint Probability Distribution. Here from function we predict probability to approximate actual probability. Principle Components : These are the factors of the process , on which a process is based on. Logistic Regression : It is used where the scenario is that input is continuous variable and output is categorical variable.It can be represented as : P(Y=1|X) = 1 / 1 + e ( 2 Σ i=1 β i x i ) Here β i is the parameter. We have to learn the ...