Data Science

Posts

Showing posts from October, 2020

Graph Analysis(Part-1)

October 30, 2020

Graph Mining Graph Mining is the set of tools and techniques used to (a) analyze the properties of real-world graphs, (b) predict how the structure and properties of a given graph might affect some application, (c) develop models that can generate realistic graphs that match the patterns found in real-world graphs of interest. Important Terms: 1. Co-authorship networks- Co-authorship is a form of association in which two or more researchers jointly report their research results on some topic. Therefore, co-authorship networks can be viewed as social networks encompassing researchers that reflect collaboration among them. Researchers are represented by nodes in co-authorship networks. 2. Citation network- a directed graph in which each vertex represents a document and in which each edge represents a citation from the current publication to another. 3. Polarization- 4....

MFCC for Audio Extraction

October 27, 2020

Audio Features MFCC - Mel-frequency cepstral coefficients ( MFCCs ) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip. Mel Spectogram - A mel spectrogram is a spectrogram where the frequencies are converted to the mel scale. Audio Data Domain There are two different domains :- 1. Time domain The major parts of Time Domain are Sampling and Quantization. Sampling means measuring the instantaneous values of continuous-time signal in a discrete form. Audio wave is a continuous signal. First we have to consider a sampling frequency (Fs - how many data points we are storing for audio at a particular point). 2. Frequency domain The Frequency Domain refers to the analytic space in which mathematical functions or signals are conveyed in terms of frequency, rather than time. For example, where a time-domain graph may display changes over ...

Deploying ML Model using Flask

October 13, 2020

Why Flask? Flask is a Python-based microframework used for developing small scale websites. Flask is very easy to make Restful API’s using python. Easy to use. Built in development server and debugger. Integrated unit testing support. RESTful request dispatching. Extensively documented. Project Structure This project has four parts : model.py — This contains code for the machine learning model to predict sales in the third month based on the sales in the first two months. app.py — This contains Flask APIs that receives sales details through GUI or API calls, computes the predicted value based on our model and returns it. request.py — This uses requests module to call APIs defined in app.py and displays the returned value. HTML/CSS — This contains the HTML template and CSS styling to allow user to enter sales detail and displays the predicted sales in the third month. Serializing/De-Serializing In simple words serializing is a way to write a python object on the disk...

Supervised Learning(Part-10)

October 11, 2020

Genetic Algorithm Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics. These are intelligent exploitation of random search provided with historical data to direct the search into the region of better performance in solution space. They are commonly used to generate high-quality solutions for optimization problems and search problems. Genetic algorithms simulate the process of natural selection which means those species who can adapt to changes in their environment are able to survive and reproduce and go to next generation. In simple words, they simulate “survival of the fittest” among individual of consecutive generation for solving a problem. Each generation consist of a population of individuals and each individual represents a point in search space and possible solution. Each individual is represented as a string of character/inte...

Supervised Learning(Part- 9)

October 07, 2020

Lazy Learning In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries. The primary motivation for employing lazy learning, as in the K-nearest neighbors algorithm, used by online recommendation systems ("people who viewed/purchased/listened to this movie/item/tune also ...") is that the data set is continuously updated with new entries (e.g., new items for sale at Amazon, new movies to view at Netflix, new clips at YouTube, new music at Spotify or Pandora). Because of the continuous update, the "training data" would be rendered obsolete in a relatively short time especially in areas like books and movies, where new best-sellers or hit movies/music are published/released continuously. Therefore, one cannot really talk ...

Supervised Learning(Part-8)

October 04, 2020

Ensemble Model Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance 1. Introduction to Ensemble Learning Let’s understand the concept of ensemble learning with an example. Suppose you are a movie director and you have created a short movie on a very important and interesting topic. Now, you want to take preliminary feedback (ratings) on the movie before making it public. What are the possible ways by which you can do that? A : You may ask one of your friends to rate the movie for you. Now it’s entirely possible that the person you have chosen loves you very much and doesn’t want to break your heart by providing a 1-star rating to the horrible work you have created. B : Another way could be by asking 5 colleagues of yours to rate the movie. This should provide a better idea of the movie. This method may provide honest ratings for your movie. But a problem still exists. These 5 people may n...

Supervised Learning(Part-7)

October 01, 2020

Random Forest The Random Forest Classifier Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble . Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction (see figure below). Visualization of a Random Forest Model Making a Prediction The fundamental concept behind random forest is a simple but powerful one — the wisdom of crowds. In data science speak, the reason that the random forest model works so well is: A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models. The low correlation between models is the key. Just like how investments with low correlations (like stocks and bonds) come together to form a portfolio that is greater than the sum of its parts, uncorrelated models can produce ensemble predictions that are more accurate than any of the...