Graph Analysis(Part-1)

 

Graph Mining 

Graph Mining is the set of tools and techniques used to 

(a) analyze the properties of real-world graphs, 

(b) predict how the structure and properties of a given graph might affect some application,

(c) develop models that can generate realistic graphs that match the patterns found in real-world graphs of interest.

Important Terms:

1. Co-authorship networks- Co-authorship is a form of association in which two or more researchers jointly report their research results on some topic. Therefore, co-authorship networks can be viewed as social networks encompassing researchers that reflect collaboration among them. Researchers are represented by nodes in co-authorship networks.



2. Citation network- directed graph in which each vertex represents a document and in which each edge represents a citation from the current publication to another.



3. Polarization- 


4. Knowledge Graphs- The knowledge graph represents a collection of interlinked descriptions of entities – objects, events or concepts. Knowledge graphs put data in context via linking and semantic metadata and this way provide a framework for data integration, unification, analytics and sharing.

5. Dead End-  when a complete column or row of the matrix converges to have a value of 0.



6. Spider Trap- a set of nodes with no out edges, all the information flows in.

Examples of Graphs 

Graphs are used to represent many real-life applications: Graphs are used to represent networks. The networks may include paths in a city or telephone network or circuit network. Graphs are also used in social networks like linkedIn, Facebook. 

1. Recommendation systems
2. Community detection
3. Marketing
4. Fraud detection
5. Social networks
6. Optimal Routing

A graph is represented as G = <V, E>, where V is the set of vertices present in the graph and E = { (vivj): viv∈ V }

Types of Graphs


Directed Graphs
If (vivj ∈ E ⇏ (vjvi∈ E , then its a Directed graph.

Undirected Graphs
If (vi, vj ∈ E ⇒ (vjvi∈ E , then its a Undirected graph.

K- partite graph
k-partite graph is a graph whose graph vertices can be partitioned into k disjoint sets so that no two vertices within the same set are adjacent. 



Fully Connected graph
It is a connected graph where a unique edge connects each pair of vertices. In other words, for every two vertices of a whole or a fully connected graph, there is a distinct edge.

Simple Graph
A simple graph, also called a strict graph is an unweighted, undirected graph containing no graph loops or multiple edges.

Adjacency Matrix

It is a type of Graph representation technique.


Some of the graphs with their adjacency matrices are given below:

For a simple graph with no self-loops, the adjacency matrix must have 0 on the diagonal. For an undirected graph, the adjacency matrix is symmetric.

Page Rank

Google's algorithm to rank the documents on web. These can be modeled as a Directed graph.


Flow of Information

Start with the nodes having same weight.

















Eigen Vectors

Eigen vector of a matrix A is a vector represented by a matrix X such that when X is multiplied with matrix A, then the direction of the resultant matrix remains same as vector X.

Mathematically, above statement can be represented as:

AX = λX

where A is any arbitrary matrix, λ are eigen values and X is an eigen vector corresponding to each eigen value.

Power iteration method is used to find the eigen vectors.


Bow Tie Structure of Web


Teleporting

Random Surfers
Random surfer teleport to any node with same probability.

Well Studied Model (Markov's chain)
In this, state at a particular point depends on the state prior at time.

A unique distribution where transition matrix is Stochastic, irreducible (needs to be fully connected) and aperiodic (break the spider web) is called Stationary Distribution.

Network Worth

It is used to understand the worth of the network and how much impact it has.

Important metrics

Between Centrality = (no. of shortest path that go through node) / (total no. of shortest paths)
Degree Centrality = (no, of neighbors) / (total no. of possible neighbors)

NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In NetworkX, nodes can be any hashable object e.g., a text string, an image, an XML object, another Graph, a customized node object, etc.

Generative model for Graphs with Community

Affiliation Graph Model

It is a generative model in which each node belongs to some community. It can express a variety of community structures. It generates overlapping communities.

Vertices V, Communities C, Membership M and Probability pc.


Discovering Communities
Given a graph, learn the membership strength of each node of a community

So the parameters to learn become |V| X |C|. 

Comments

Popular posts from this blog

Supervised Learning(Part-5)

Supervised Learning(Part-2)

Text Analysis (Part - 4)