Connected components

Another important property of graphs is that of connectivity. A graph is said to be connected if there is a path of edges connecting any two vertices we choose, regardless of the edge directions. So, for directed graphs, we completely neglect the directions for this definition. What can be a stricter definition of connectivity used for directed graphs? A graph is said to be strongly connected if any two vertices are connected by a directed chain of edges. Note that strong connectivity is a very strong assumption to impose on a directed graph. In particular, any strongly connected graph is cyclic. These definitions allow us to define the closely related concept of (strongly) connected components. Every graph can be decomposed into connected components. If it is connected, there is precisely one such component. If it is not, there are at least two. Formally defined, a connected component is the largest subgraph of a given graph that is still connected. The same rationale holds for strongly connected components. Connectivity is an important measure, as it allows us to cluster the vertices of a graph into groups that naturally belong together.

For instance, one might be interested in the number of connected components in a social graph indicating friendship. In a small graph, there may be many separate components. However, the larger the graph, one might suspect that it is more likely to have just a single connected component, following the commonly accepted rationale that everyone is connected to each other by around six connections.

We will see how to compute connected components with GraphX in the next section; for now, let's just inspect one simple example. In the following diagram, we see a directed graph with twelve vertices:

Figure 6: Connected and strongly connected components can easily be read off in small graphs, but this becomes increasingly difficult for larger graphs.

We can immediately see that it has three connected components, namely the three sets of vertices {1, 2, 3}, {4, 5}, and {6, 7, 8, 9, 10, 11, 12}. As for strongly connected components, that requires a little more effort than a quick visual inspection. We can see that {4, 5} forms a strongly connected component, and so does {8, 9, 10, 11}. All the other six vertices form their own strongly connected components, that is, they are isolated. This example goes on to show that for a massive graph with millions of vertices, with the right visualization tool, we may be lucky to find roughly connected components, but strongly connected components are a little more complicated to compute, and this is just one use case where Spark GraphX comes in handy.

Table of Contents for Connected components

Create new playlist

Sign In

Sign Up

Table of Contents for
Connected components