Calculate Agglomerated Scores of Enriched Terms for Each Subject

score_terms(
  enrichment_table,
  exp_mat,
  cases = NULL,
  use_description = FALSE,
  plot_hmap = TRUE,
  ...
)

Arguments

enrichment_table

a data frame that must contain the 3 columns below:

Term_Description: Description of the enriched term (necessary if use_description = TRUE)
ID: ID of the enriched term (necessary if use_description = FALSE)
Up_regulated: the up-regulated genes in the input involved in the given term's gene set, comma-separated
Down_regulated: the down-regulated genes in the input involved in the given term's gene set, comma-separated

exp_mat

the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.

cases

(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)

use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

plot_hmap

Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)

...

Additional arguments for plot_scores for aesthetics of the heatmap plot

Value

Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix.

Conceptual Background

For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, we denote:

E as a matrix of size m x n
G as the set of all genes in the experiment G = E_i., i ∈ [1, m]
S as the set of all samples in the experiment S = E_.j, i ∈ [1, n]

We next define the gene score matrix GS (the standardized experiment matrix, also of size m x n) as:

GS_gs = (E_gs - ē_g) / s_g

where g ∈ G, s ∈ S, ē_g is the mean of all values for gene g and s_g is the standard deviation of all values for gene g.

We next denote T to be a set of terms (where each t ∈ T is a set of term-related genes, i.e., t = {g_x, ..., g_y} ⊂ G) and finally define the agglomerated term scores matrix TS (where rows correspond to genes and columns corresponds to samples s.t. the matrix has size |T| x n) as:

TS_ts = 1/|t| ∑ _{g ∈ t} GS_gs, where t ∈ T and s ∈ S.

Examples

score_matrix <- score_terms(
  example_pathfindR_output,
  example_experiment_matrix,
  plot_hmap = FALSE
)