Prioritize Cancer Driver Genes

prioritize_driver_genes(features_df, cancer_type)

Arguments

features_df

the features data frame for all genes, containing the following columns:

gene_symbol

HGNC gene symbol

metaprediction_score

the maximum metapredictor (coding) impact score for the gene

noncoding_score

the maximum non-coding PHRED-scaled CADD score for the gene

scna_score

SCNA proxy score. SCNA density (SCNA/Mb) of the minimal common region (MCR) in which the gene is located

hotspot_double_hit

boolean indicating whether the gene is a hotspot gene (indication of oncogenes) or subject to double-hit (indication of tumor-suppressor genes)

phenolyzer_score

'phenolyzer' score for the gene

hsa03320

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04010

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04020

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04024

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04060

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04066

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04110

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04115

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04150

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04151

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04210

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04310

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04330

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04340

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04350

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04370

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04510

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04512

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04520

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04630

boolean indicating whether or not the gene takes part in this KEGG pathway

hsa04915

boolean indicating whether or not the gene takes part in this KEGG pathway

cancer_type

short name of the cancer type. All available cancer types are listed in MTL_submodel_descriptions

Value

data frame with 3 columns:

gene_symbol

HGNC gene symbol

driverness_prob

estimated probability for each gene in features_df of being a cancer driver. The probabilities are calculated using the selected (via cancer_type) cancer type's sub-model.

prediction

prediction based on the cancer-type-specific threshold (either "driver" or "non-driver")

See also

create_features_df for creating the features table. TCGA_MTL_fit for details on the MTL model used for prediction.

Examples

drivers_df <- prioritize_driver_genes(example_features_table, "LUAD")