Perform Active Subnetwork Search

active_snw_search(
  input_for_search,
  pin_name_path = "Biogrid",
  snws_file = "active_snws",
  dir_for_parallel_run = NULL,
  score_quan_thr = 0.8,
  sig_gene_thr = 0.02,
  search_method = "GR",
  silent_option = TRUE,
  use_all_positives = FALSE,
  geneInitProbs = 0.1,
  saTemp0 = 1,
  saTemp1 = 0.01,
  saIter = 10000,
  gaPop = 400,
  gaIter = 10000,
  gaThread = 5,
  gaCrossover = 1,
  gaMut = 0,
  grMaxDepth = 1,
  grSearchDepth = 1,
  grOverlap = 0.5,
  grSubNum = 1000
)

Arguments

input_for_search

input the input data that active subnetwork search uses. The input must be a data frame containing at least these 2 columns:

GENE

Gene Symbol

P_VALUE

p value obtained through a test, e.g. differential expression/methylation

pin_name_path

Name of the chosen PIN or path/to/PIN.sif. If PIN name, must be one of c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING"). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = "Biogrid")

snws_file

name for active subnetwork search output data without file extension (default = "active_snws")

dir_for_parallel_run

(previously created) directory for a parallel run iteration. Used in the wrapper function (see ?run_pathfindR) (Default = NULL)

score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2

search_method

algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = "GR").

silent_option

boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.

use_all_positives

if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)

geneInitProbs

For SA and GA, probability of adding a gene in initial solution (default = 0.1)

saTemp0

Initial temperature for SA (default = 1.0)

saTemp1

Final temperature for SA (default = 0.01)

saIter

Iteration number for SA (default = 10000)

gaPop

Population size for GA (default = 400)

gaIter

Iteration number for GA (default = 200)

gaThread

Number of threads to be used in GA (default = 5)

gaCrossover

Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)

gaMut

For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)

grMaxDepth

Sets max depth in greedy search, 0 for no limit (default = 1)

grSearchDepth

Search depth in greedy search (default = 1)

grOverlap

Overlap threshold for results of greedy search (default = 0.5)

grSubNum

Number of subnetworks to be presented in the results (default = 1000)

Value

A list of genes in every identified active subnetwork that has a score greater than the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.

Examples

processed_df <- RA_input[1:15, -2]
colnames(processed_df) <- c("GENE", "P_VALUE")
GR_snws <- active_snw_search(input_for_search = processed_df,
                             pin_name_path = "KEGG",
                             search_method = "GR",
                             score_quan_thr = 0.8)
#> Found 2 active subnetworks
#> 
# clean-up
unlink("active_snw_search", recursive = TRUE)