Clustering with the Leiden Algorithm on Bipartite Graphs

The Leiden R package supports calling built-in methods for Bipartite graphs. This vignette assumes you already have the 'leiden' package installed. See the other vignettes for details.

Set up

First we import the functions required in the package.

library("leiden")

We also require a example bipartite graph. Here we import a dataset from the 'networkdata' package (bear in mind this is a large package not available on CRAN).

remotes::install_github("schochastics/networkdata")
bipartite_graph <- networkdata::southern_women
bipartite_graph

#> IGRAPH 39d6f6e UN-B 32 93 -- 
#> + attr: type (v/l), name (v/c)
#> + edges from 39d6f6e (vertex names):
#>  [1] EVELYN   --E1  EVELYN   --E2  EVELYN   --E3  EVELYN   --E4  EVELYN   --E5  EVELYN   --E6  EVELYN   --E8 
#>  [8] EVELYN   --E9  LAURA    --E1  LAURA    --E2  LAURA    --E3  LAURA    --E5  LAURA    --E6  LAURA    --E7 
#> [15] LAURA    --E8  THERESA  --E2  THERESA  --E3  THERESA  --E4  THERESA  --E5  THERESA  --E6  THERESA  --E7 
#> [22] THERESA  --E8  THERESA  --E9  BRENDA   --E1  BRENDA   --E3  BRENDA   --E4  BRENDA   --E5  BRENDA   --E6 
#> [29] BRENDA   --E7  BRENDA   --E8  CHARLOTTE--E3  CHARLOTTE--E4  CHARLOTTE--E5  CHARLOTTE--E7  FRANCES  --E3 
#> [36] FRANCES  --E5  FRANCES  --E6  FRANCES  --E8  ELEANOR  --E5  ELEANOR  --E6  ELEANOR  --E7  ELEANOR  --E8 
#> [43] PEARL    --E6  PEARL    --E8  PEARL    --E9  RUTH     --E5  RUTH     --E7  RUTH     --E8  RUTH     --E9 
#> [50] VERNE    --E7  VERNE    --E8  VERNE    --E9  VERNE    --E12 MYRA     --E8  MYRA     --E9  MYRA     --E10
#> + ... omitted several edges

Usage

Bipartite graph objects

Now we have a bipartite graph structure. This structure has a “type” vertex attribute defining 2 distinct sets of vertices that do not have edges within them

table(as.numeric(V(bipartite_graph)$type))
#> 
#>  0  1 
#> 18 14

Here we import a plotting function to display these 2 groups.

library("graphsim")
node.cols <- c("palevioletred", "lightblue")[as.integer(V(bipartite_graph)$type)+1]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)

This data can also be represented by an adjacency matrix derived from a graph object.

library("igraph")
bipartite_matrix <- igraph::as_adjacency_matrix(bipartite_graph)
bipartite_matrix 
#> 32 x 32 sparse Matrix of class "dgCMatrix"
#>    [[ suppressing 32 column names 'EVELYN', 'LAURA', 'THERESA' ... ]]
#>                                                                          
#> EVELYN    . . . . . . . . . . . . . . . . . . 1 1 1 1 1 1 . 1 1 . . . . .
#> LAURA     . . . . . . . . . . . . . . . . . . 1 1 1 . 1 1 1 1 . . . . . .
#> THERESA   . . . . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 . . . . .
#> BRENDA    . . . . . . . . . . . . . . . . . . 1 . 1 1 1 1 1 1 . . . . . .
#> CHARLOTTE . . . . . . . . . . . . . . . . . . . . 1 1 1 . 1 . . . . . . .
#> FRANCES   . . . . . . . . . . . . . . . . . . . . 1 . 1 1 . 1 . . . . . .
#> ELEANOR   . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 . . . . . .
#> PEARL     . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 1 . . . . .
#> RUTH      . . . . . . . . . . . . . . . . . . . . . . 1 . 1 1 1 . . . . .
#> VERNE     . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . . 1 . .
#> MYRA      . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . 1 . .
#> KATHERINE . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . 1 1 1
#> SYLVIA    . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1
#> NORA      . . . . . . . . . . . . . . . . . . . . . . . 1 1 . 1 1 1 1 1 1
#> HELEN     . . . . . . . . . . . . . . . . . . . . . . . . 1 1 . 1 1 1 1 1
#> DOROTHY   . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . 1 . .
#> OLIVIA    . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . .
#> FLORA     . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . .
#> E1        1 1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> E2        1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> E3        1 1 1 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
#> E4        1 . 1 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> E5        1 1 1 1 1 1 1 . 1 . . . . . . . . . . . . . . . . . . . . . . .
#> E6        1 1 1 1 . 1 1 1 . . . . . 1 . . . . . . . . . . . . . . . . . .
#> E7        . 1 1 1 1 . 1 . 1 1 . . 1 1 1 . . . . . . . . . . . . . . . . .
#> E8        1 1 1 1 . 1 1 1 1 1 1 1 1 . 1 1 . . . . . . . . . . . . . . . .
#> E9        1 . 1 . . . . 1 1 1 1 1 1 1 . 1 1 1 . . . . . . . . . . . . . .
#> E10       . . . . . . . . . . 1 1 1 1 1 1 . . . . . . . . . . . . . . . .
#> E11       . . . . . . . . . . . . . 1 1 . 1 1 . . . . . . . . . . . . . .
#> E12       . . . . . . . . . 1 1 1 1 1 1 1 . . . . . . . . . . . . . . . .
#> E13       . . . . . . . . . . . 1 1 1 1 . . . . . . . . . . . . . . . . .
#> E14       . . . . . . . . . . . 1 1 1 1 . . . . . . . . . . . . . . . . .

Running the Leiden algorithm in R

Then the Leiden algorithm can be run on the adjacency matrix using a partition type for Bipartite graphs. Here the types are computed automatically.

partition <- leiden(bipartite_matrix, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.2, seed = 42)
#> Warning in paste(sig, collapse = "#"): NAs introduced by coercion to integer range

#> Warning in paste(sig, collapse = "#"): NAs introduced by coercion to integer range

#> Warning in paste(sig, collapse = "#"): NAs introduced by coercion to integer range
#> Warning in deparse(substitute(arg)): NAs introduced by coercion to integer range
#> Warning in paste(sig, collapse = "#"): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in paste(el[, 1], el[, 2], sep = "|"): NAs introduced by coercion to integer range

#> Warning in paste(el[, 1], el[, 2], sep = "|"): NAs introduced by coercion to integer range
#> Warning in deparse(substitute(arg)): NAs introduced by coercion to integer range
#> Warning in paste0("igraph::", x): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in deparse(x[[1L]]): NAs introduced by coercion to integer range
#> Warning in deparse(expr, width.cutoff, ...): NAs introduced by coercion to integer range
#> Warning in paste(deparse(expr, width.cutoff, ...), collapse = collapse): NAs introduced by coercion to integer range
#> Warning in paste(args, collapse = ""): NAs introduced by coercion to integer range
#> Warning in paste0(msg, "\n"): NAs introduced by coercion to integer range
#> computing bipartite partitions

table(partition)
#> partition
#>  1 
#> 32

The Leiden algorithm also supports Bipartite graphs in igraph objects. Here the type attributes are passed from igraph.

partition <- leiden(bipartite_graph, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.2, seed = 42)

table(partition)
#> partition
#>  1  2 
#> 17 15

Here we can see partitions in the plotted results. The nodes that are more interconnected have been partitioned into separate clusters. Note that these do not correspond to Bipartite “types” as these are accounted for.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)

Bipartite cost functions

The Modularity Vertex Partition is also supported.

partition <- leiden(bipartite_graph, partition_type = "ModularityVertexPartition.Bipartite", resolution_parameter = 0.02, seed = 42)

table(partition)
#> partition
#> 1 2 3 4 5 6 7 
#> 9 6 3 4 4 4 2

Here we can see partitions in the plotted results are different to as those computed above.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)

Resolution Parameter

Reducing the resolution parameter gives similar results to previously with the CPM vertex partition. Note that very low resolution values are typical for this cost function.

partition <- leiden(bipartite_graph, partition_type = "ModularityVertexPartition.Bipartite", resolution_parameter = 0.005, seed = 42)

table(partition)
#> partition
#>  1  2 
#> 17 15

Here we can see partitions in the plotted results are the same as those computed above.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)

Maximum Community Size Parameter

The resolutionmax_comm_size parameter applies on bipartite graphs to fine-tuning the size of clusters detected.

partition <- leiden(bipartite_graph, partition_type = "ModularityVertexPartition.Bipartite", max_comm_size = 6, resolution_parameter = 0.025, seed = 42)

table(partition)
#> partition
#>  1  2  3  4  5  6  7  8  9 10 
#>  6  5  4  4  4  3  3  1  1  1

Here we can see partitions in the plotted results are contain many smaller clusters.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(11, partition)),"Pastel1")[partition]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)

Equivalence of cost functions

The CPM vertex partition when degree as node size is TRUE is equivalent to the Modularity cost function.

partition <- leiden(bipartite_graph, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.02, seed = 42,
                    degree_as_node_size = TRUE)

table(partition)
#> partition
#> 1 2 3 4 5 6 7 
#> 9 6 3 4 4 4 2

Here we can see partitions in the plotted results are the same as those computed above.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(bipartite_graph, vertex.color = node.cols, layout = layout.kamada.kawai)