Hierarchical Clustering of Enriched Terms

hierarchical_term_clustering(
  kappa_mat,
  enrichment_res,
  num_clusters = NULL,
  use_description = FALSE,
  clu_method = "average",
  plot_hmap = FALSE,
  plot_dend = TRUE
)

Arguments

kappa_mat

matrix of kappa statistics (output of create_kappa_matrix)

enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. If use_active_snw_genes = TRUE, 'non_Signif_Snw_Genes' must also be provided.

num_clusters

number of clusters to be formed (default = NULL). If NULL, the optimal number of clusters is determined as the number which yields the highest average silhouette width.

use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

clu_method

the agglomeration method to be used (default = 'average', see hclust)

plot_hmap

boolean to indicate whether to plot the kappa statistics clustering heatmap or not (default = FALSE)

plot_dend

boolean to indicate whether to plot the clustering dendrogram partitioned into the optimal number of clusters (default = TRUE)

Value

a vector of clusters for each enriched term in the enrichment results.

Details

The function initially performs hierarchical clustering of the enriched terms in enrichment_res using the kappa statistics (defining the distance as 1 - kappa_statistic). Next, the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters (where n is the number of terms). The optimal number of clusters is determined as the k value which yields the highest average silhouette width. (if num_clusters not specified)

Examples

if (FALSE) {
hierarchical_term_clustering(kappa_mat, enrichment_res)
hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
}