Process Input
input_processing(
input,
p_val_threshold = 0.05,
pin_name_path = "Biogrid",
convert2alias = TRUE
)
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (OPTIONAL)
p value, e.g. adjusted p value associated with differential expression
the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.
This function first filters the input so that all p values are less than or equal to the threshold. Next, gene symbols that are not found in the PIN are identified. If aliases of these gene symbols are found in the PIN, the symbols are converted to the corresponding aliases. The resulting data frame containing the original gene symbols, the updated symbols, change values and p values is then returned.
See run_pathfindR
for the wrapper function of the
pathfindR workflow
processed_df <- input_processing(
input = example_pathfindR_input[1:5, ],
pin_name_path = 'KEGG'
)
#> ## Processing input. Converting gene symbols,
#> if necessary (and if human gene symbols provided)
#> Number of genes provided in input: 5
#> Number of genes in input after p-value filtering: 5
#>
#> Could not find any interactions for 2 (40%) genes in the PIN
#> Final number of genes in input: 3
processed_df <- input_processing(
input = example_pathfindR_input[1:10, ],
pin_name_path = 'KEGG',
convert2alias = FALSE
)
#> ## Processing input. Converting gene symbols,
#> if necessary (and if human gene symbols provided)
#> Number of genes provided in input: 10
#> Number of genes in input after p-value filtering: 10
#> Could not find any interactions for 6 (60%) genes in the PIN
#> Final number of genes in input: 4