Process Input

input_processing(
  input,
  p_val_threshold = 0.05,
  pin_name_path = "Biogrid",
  convert2alias = TRUE
)

Arguments

input

the input data that pathfindR uses. The input must be a data frame with three columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (OPTIONAL)

  3. p value, e.g. adjusted p value associated with differential expression

p_val_threshold

the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)

pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING"). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = "Biogrid")

convert2alias

boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.

Value

This function first filters the input so that all p values are less than or equal to the threshold. Next, gene symbols that are not found in the PIN are identified. If aliases of these gene symbols are found in the PIN, the symbols are converted to the corresponding aliases. The resulting data frame containing the original gene symbols, the updated symbols, change values and p values is then returned.

See also

See run_pathfindR for the wrapper function of the pathfindR workflow

Examples

processed_df <- input_processing(input = RA_input[1:5, ],
                                 pin_name_path = "KEGG")
#> Number of genes provided in input: 5
#> Number of genes in input after p-value filtering: 5
#> 
#> Could not find any interactions for 2 (40%) genes in the PIN
#> Final number of genes in input: 3
processed_df <- input_processing(input = RA_input[1:10, ],
                                 pin_name_path = "KEGG",
                                 convert2alias = FALSE)
#> Number of genes provided in input: 10
#> Number of genes in input after p-value filtering: 10
#> Could not find any interactions for 6 (60%) genes in the PIN
#> Final number of genes in input: 4