Graph Analysis(Part-2)

November 01, 2020

Graph Partitioning

A graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph.

Important terms

Cliques- fully connected sub graphs of original graph G.

Maximum Cliques- cliques where no node can be added from parent graph where new sub graph is also a clique.

Betweenness- it is the number of shortest path that includes the edge. GN algorithm is used to calculate the betweenness of the path from one node to all other nodes.

Cutting Graphs

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut.

Quality of Cluster (Cut)

Number of edges or sum of weights of edges such that the edge is in between a node in cluster, and another node is not in that cluster.

Spectral Clustering

It treats each data point as a graph-node and thus transforms the clustering problem into a graph-partitioning problem.

Let their be a graph given as G = <V, E>, there exists, a Diagonal Matrix D with dii = degree(ni) and an Adjacency Matrix with aij = 1 if (ni, nj) ∈ E else 0.

The three main steps followed in Spectral Clustering are:-

1. Building a Similarity Graph

This step builds the Similarity Graph in the form of an adjacency matrix which is represented by A. The adjacency matrix can be built using Epsilon-neighborhood Graph, K-Nearest Neighbors and Fully-Connected Graph.

2. Projecting data onto lower dimensional space

This step is done to account for the possibility that members of the same cluster may be far away in the given dimensional space. Thus the dimensional space is reduced so that those points are closer in the reduced dimensional space and thus can be clustered together by a traditional clustering algorithm. It is done by computing the Graph Laplacian Matrix . To compute it though first, the degree of a node needs to be defined. The degree of the ith node is given by:-

Note that $w_{ij}$ is the edge between the nodes i and j as defined in the adjacency matrix above.

The degree matrix is defined as:-

Thus, the Laplacian Matrix is defined as L = D - A.

This Matrix is then normalized for mathematical efficiency. To reduce the dimensions, first, the eigenvalues and the respective eigenvectors are calculated. If the number of clusters is k then the first eigenvalues and their eigen-vectors are taken and stacked into a matrix such that the eigen-vectors are the columns.

3. Clustering the data

This process mainly involves clustering the reduced data by using any traditional clustering technique – typically K-Means Clustering. First, each node is assigned a row of the normalized of the Graph Laplacian Matrix. Then this data is clustered using any traditional technique. To transform the clustering result, the node identifier is retained.

Introduction to Neural Networks

Every neuron is connected with other neuron through a connection link. Each connection link is associated with a weight that has information about the input signal. This is the most useful information for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is being communicated. Each neuron has an internal state, which is called an activation signal. Output signals, which are produced after combining the input signals and activation rule, may be sent to other units.

The following diagram represents the general model of ANN followed by its processing.

For the above general model of artificial neural network, the net input can be calculated as follows −

The output can be calculated by applying the activation function over the net input.

Activation Functions

It may be defined as the extra force or effort applied over the input to obtain an exact output.

Followings are some activation functions of interest −

Linear Activation Function

It is also called the identity function as it performs no input editing. It can be defined as :-

F(x) = x

Sigmoid Activation Function

It is of two types:-

1. Binary Sigmoid function

This activation function performs input editing between 0 and 1. It is positive in nature. It is always bounded, which means its output cannot be less than 0 and more than 1. It is also strictly increasing in nature, which means more the input higher would be the output. It can be defined as:-

2. Bipolar Sigmoid Function

This activation function performs input editing between -1 and 1. It can be positive or negative in nature. It is always bounded, which means its output cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid function. It can be defined as:-

Search This Blog

Data Science