Graph Mining
Graph Mining is the set of tools and techniques used to
(a) analyze the properties of real-world graphs,
(b) predict how the structure and properties of a given graph might affect some application,
(c) develop models that can generate realistic graphs that match the patterns found in real-world graphs of interest.
Important Terms:
1. Co-authorship networks- Co-authorship is a form of association in which two or more researchers jointly report their research results on some topic. Therefore, co-authorship networks can be viewed as social networks encompassing researchers that reflect collaboration among them. Researchers are represented by nodes in co-authorship networks.
2. Citation network- a directed graph in which each vertex represents a document and in which each edge represents a citation from the current publication to another.
3. Polarization-
4. Knowledge Graphs- The knowledge graph represents a collection of interlinked descriptions of entities – objects, events or concepts. Knowledge graphs put data in context via linking and semantic metadata and this way provide a framework for data integration, unification, analytics and sharing.
5. Dead End- when a complete column or row of the matrix converges to have a value of 0.
6. Spider Trap- a set of nodes with no out edges, all the information flows in.
Examples of Graphs
Graphs are used to represent many real-life applications: Graphs are used to represent networks. The networks may include paths in a city or telephone network or circuit network. Graphs are also used in social networks like linkedIn, Facebook.
1. Recommendation systems
2. Community detection
3. Marketing
4. Fraud detection
5. Social networks
6. Optimal Routing
A graph is represented as G = <V, E>, where V is the set of vertices present in the graph and E = { (vi, vj): vi, vj ∈ V }
Types of Graphs
Directed Graphs
If (vi, vj ∈ E ⇏ (vj, vi) ∈ E , then its a Directed graph.
Undirected GraphsIf (vi, vj ∈ E ⇒ (vj, vi) ∈ E , then its a Undirected graph.
K- partite graph
It is a connected graph where a unique edge connects each pair of vertices. In other words, for every two vertices of a whole or a fully connected graph, there is a distinct edge.
Simple Graph
A simple graph, also called a strict graph is an unweighted, undirected graph containing no graph loops or multiple edges.
Adjacency Matrix
It is a type of Graph representation technique.
Some of the graphs with their adjacency matrices are given below:
For a simple graph with no self-loops, the adjacency matrix must have 0 on the diagonal. For an
undirected graph, the adjacency matrix is
symmetric.
Page Rank
Google's algorithm to rank the documents on web. These can be modeled as a Directed graph.
Start with the nodes having same weight.
Eigen Vectors
Eigen vector of a matrix A is a vector represented by a matrix X such that when X is multiplied with matrix A, then the direction of the resultant matrix remains same as vector X.
Mathematically, above statement can be represented as:
AX = λX
where A is any arbitrary matrix, λ are eigen values and X is an eigen vector corresponding to each eigen value.
Power iteration method is used to find the eigen vectors.
Bow Tie Structure of Web
Teleporting
Random Surfers
Random surfer teleport to any node with same probability.
Well Studied Model (Markov's chain)
In this, state at a particular point depends on the state prior at time.
A unique distribution where transition matrix is Stochastic, irreducible (needs to be fully connected) and aperiodic (break the spider web) is called Stationary Distribution.
Network Worth
It is used to understand the worth of the network and how much impact it has.
Important metrics
Between Centrality = (no. of shortest path that go through node) / (total no. of shortest paths)
Degree Centrality = (no, of neighbors) / (total no. of possible neighbors)
NetworkX
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In NetworkX, nodes can be any hashable object e.g., a text string, an image, an XML object, another Graph, a customized node object, etc.
Generative model for Graphs with Community
Affiliation Graph Model
It is a generative model in which each node belongs to some community. It can express a variety of community structures. It generates overlapping communities.
Vertices V, Communities C, Membership M and Probability pc.
Discovering Communities
Given a graph, learn the membership strength of each node of a community
So the parameters to learn become |V| X |C|.
Comments
Post a Comment