Clusters works using the Leiden community detection algorithm via
igraph::cluster_leiden(). If no network is provided, a semantic
similarity network is built automatically from embeddings.
Usage
sm_cluster_leiden(
corpus,
network = NULL,
resolution = 1,
call = rlang::caller_env()
)Arguments
- corpus
An sm_corpus object.
- network
A tidygraph::tbl_graph or igraph::igraph object, or
NULL. IfNULL(default), a semantic similarity network is built viasm_network_semantic().- resolution
Numeric; resolution parameter for the Leiden algorithm. Higher values produce more (smaller) clusters. Defaults to
1.0.- call
Caller environment for error reporting.
Details
The Leiden algorithm (Traag et al., 2019) is a refinement of the Louvain algorithm that guarantees well-connected communities. It operates on the edge weights of the network.
When network = NULL, embeddings must be present in the corpus so that
sm_network_semantic() can build a k-NN graph.
See also
Other clustering:
sm_cluster_evolution(),
sm_cluster_hdbscan(),
sm_cluster_kmeans(),
sm_cluster_label()
Examples
# \donttest{
corpus <- sm_example_corpus(with_embeddings = TRUE)
corpus <- sm_cluster_leiden(corpus, resolution = 1.0)
#> ✔ Leiden clustering complete.
#> ℹ 5 communities found.
table(corpus$works$cluster_id)
#>
#> 1 2 3 4 5
#> 49 39 45 40 27
# }