Skip to contents

Clusters works using the Leiden community detection algorithm via igraph::cluster_leiden(). If no network is provided, a semantic similarity network is built automatically from embeddings.

Usage

sm_cluster_leiden(
  corpus,
  network = NULL,
  resolution = 1,
  call = rlang::caller_env()
)

Arguments

corpus

An sm_corpus object.

network

A tidygraph::tbl_graph or igraph::igraph object, or NULL. If NULL (default), a semantic similarity network is built via sm_network_semantic().

resolution

Numeric; resolution parameter for the Leiden algorithm. Higher values produce more (smaller) clusters. Defaults to 1.0.

call

Caller environment for error reporting.

Value

The input corpus with a cluster_id column added to corpus$works.

Details

The Leiden algorithm (Traag et al., 2019) is a refinement of the Louvain algorithm that guarantees well-connected communities. It operates on the edge weights of the network.

When network = NULL, embeddings must be present in the corpus so that sm_network_semantic() can build a k-NN graph.

Examples

# \donttest{
corpus <- sm_example_corpus(with_embeddings = TRUE)
corpus <- sm_cluster_leiden(corpus, resolution = 1.0)
#>  Leiden clustering complete.
#>  5 communities found.
table(corpus$works$cluster_id)
#> 
#>  1  2  3  4  5 
#> 49 39 45 40 27 
# }