Posts

Showing posts from December, 2020

Auto Encoder

Image
  Auto - Encoder Principal Component Analysis Principal Component Analysis is uses to reduce the dimensions. PCA tries to find out important set of features to get the perfect decision boundary. Auto Encoder also does the same but in terms of features. Also if the data is too complex Auto-encoder is preffered over PCA                        Auto-encoder takes raw data like image as input and creates a vector representation of it using real numbers. What are the usual properties of the vector? 1. This should represent important features/attributes of the raw data with a numerical value. That is every element of this vector represents one or more significant attributes of raw data. These numbers are used to assign numbers to attributes  2. It should be automatically created or generated by a neural network. Neural Network can be simple Multi Layer Preceptron for simple data or Convolution Neural Network for image data or Recurrent Neural Network for audio data. 3. Auto - encoder vector a

Attention Model(Part-2)

Image
  Decoder for all the time steps: For one particular time step in decoder: Steps of Attention Model: Step 1: Compute a score for each Encoder State. Step 2: Compute the attention weights. Step 3: Compute the Context Vector. Step 4: Concatenate Context vector with output of previous time step. Attention Model Decoder:

Attention Model (Part-1)

Image
  Encoder Decoder Model Seq2Seq Model Seq2Seq Embedding Seq2Seq Teacher Forcing We can either give the output of previous step or the actual value as input to next step. Teacher Forcing uses ground truth  (actual value) of previous step as input of current step. Used in training step.  In teting phase output of previous step is used as input to current step. Limitation Of Seq2Seq Modeling As we are using the final encoder state only for every decoder prediction we are not using the alignment of individual elements of input sequence to each element of output sequence.

Recurrent Neural Networks(RNN Part-3)

  Implementation in TensorFlow Class RNN Model/ CNN Model/ Model/ Encoder Code to implement neural net application independent tf.Keras.Model to be inherited Class application specific name application specific __init__():  object of encoder class will be declared here train():  load training data, run batches of training train_step():  application independent, optimize neural network for one batch, forward batch and backward batch load_model():  restore previous weights (from pre trained model) Load all data application specific loads all data from file collect metadata load data and create matrix  create X (input matrix) and Y (output matrix) creation of one hot encoding Load all batches create a generator processed batch data return padded X one hot encoded Y Example import tensorflow as tf  import numpy as np class RNN_Encoder(tf.keras.Model):     def __init__(self,nodes):         super(RNN_Encoder,self).__init__()         self.rnn_1 = tf.keras.layers.GRU(nodes[0],return_sequences=

Recurrent Neural Network(RNN Part-2)

Image
  Different RNN Models Many to one We take only the LAST output of RNN   Sentence Classification (Toxic Comment Classification)   Image Classification   Anomaly Detection  Many to Many We take all the output of RNN  POS tagging of sentence  Video Frame Classification  Next Word Prediction RNN Example  Task - C Class POS Tagging Input - 1 x T x F ○ T = 4 : Timesteps ○ F = Number of features (?)  Target - 1 x T x C ○ C = Number of classes (?) RNN Limitations Vanishing / Exploding Gradient Problem  In forward pass output of one Timestep (t) is fed to the next Timestep (t+1)  At any Timestep (t) is the trainable weights are same  At any Timestep (t) is the input is different  In backward pass (Backpropagation through time) we move from (t+1) to t  Due to chain rule of gradients are multiplied   What if gradient at one Timestep (t) is very small - Vanishing Gradient  What if gradient at one Timestep (t) is very large - Exploding Gradient Long Short Term Memory LSTM  Input gate:  contributio

Recurrent Neural Network(RNN Part-1)

Image
  Sequential Data In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called elements, or terms). The number of elements (possibly infinite) is called the length of the sequence. Unlike a set, the same elements can appear multiple times at different positions in a sequence, and unlike a set, the order does matter. Sequential Data - Representation Every element of sequence can be a D dimensional vector as well  In that case a sample is a T x D dimension array   For scalar elements a sample is a T x 1 dimension array  Every sample for a sequential data have 2 dimensions (T and D) Sequence Modeling Problem Input is a sequence of length T   Output is another sequence of length Q  T = Q - POS Tagging  T > Q - Speech to Text  T < Q - Translation , this can be T >= Q as well  MLP Limitations For any layer with input x, trainable weight W and Bias b we can write the Transformation e