Calculate Agglomerated Scores of Enriched Terms for Each Subject

score_terms(
  enrichment_table,
  exp_mat,
  cases = NULL,
  use_description = FALSE,
  plot_hmap = TRUE,
  ...
)

Arguments

enrichment_table

a data frame that must contain the 3 columns below:

Term_Description

Description of the enriched term (necessary if use_description = TRUE)

ID

ID of the enriched term (necessary if use_description = FALSE)

Up_regulated

the up-regulated genes in the input involved in the given term's gene set, comma-separated

Down_regulated

the down-regulated genes in the input involved in the given term's gene set, comma-separated

exp_mat

the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.

cases

(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)

use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

plot_hmap

Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)

...

Additional arguments for plot_scores for aesthetics of the heatmap plot

Value

Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix.

Conceptual Background

For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, we denote:

  • E as a matrix of size m x n

  • G as the set of all genes in the experiment G = Ei., i ∈ [1, m]

  • S as the set of all samples in the experiment S = E.j, i ∈ [1, n]

We next define the gene score matrix GS (the standardized experiment matrix, also of size m x n) as:

GSgs = (Egs - ēg) / sg

where g ∈ G, s ∈ S, ēg is the mean of all values for gene g and sg is the standard deviation of all values for gene g.

We next denote T to be a set of terms (where each t ∈ T is a set of term-related genes, i.e., t = {gx, ..., gy} ⊂ G) and finally define the agglomerated term scores matrix TS (where rows correspond to genes and columns corresponds to samples s.t. the matrix has size |T| x n) as:

TSts = 1/|t| ∑ g ∈ t GSgs, where t ∈ T and s ∈ S.

Examples

score_matrix <- score_terms(
  example_pathfindR_output,
  example_experiment_matrix,
  plot_hmap = FALSE
)