View source: R/core_functions.R
create_features_df | R Documentation |
Create Data Frame of Features for Driver Gene Prioritization
create_features_df(
annovar_csv_path,
scna_df,
phenolyzer_annotated_gene_list_path,
batch_analysis = FALSE,
prep_phenolyzer_input = FALSE,
build = "GRCh37",
log2_ratio_threshold = 0.25,
gene_overlap_threshold = 25,
MCR_overlap_threshold = 25,
hotspot_threshold = 5L,
log2_hom_loss_threshold = -1,
verbose = TRUE,
na.string = "."
)
annovar_csv_path |
path to 'ANNOVAR' csv output file |
scna_df |
the SCNA segments data frame. Must contain:
|
phenolyzer_annotated_gene_list_path |
path to 'phenolyzer' "annotated_gene_list" file |
batch_analysis |
boolean to indicate whether to perform batch analysis
( |
prep_phenolyzer_input |
boolean to indicate whether or not to create
a vector of genes for use as the input of 'phenolyzer' (default = |
build |
genome build for the SCNA segments data frame (default = "GRCh37") |
log2_ratio_threshold |
the log2 ratio threshold for keeping high-confidence SCNA events (default = 0.25) |
gene_overlap_threshold |
the percentage threshold for the overlap between a segment and a transcript (default = 25). This means that if only a segment overlaps a transcript more than this threshold, the transcript is assigned the segment's SCNA event. |
MCR_overlap_threshold |
the percentage threshold for the overlap between a gene and an MCR region (default = 25). This means that if only a gene overlaps an MCR region more than this threshold, the gene is assigned the SCNA density of the MCR |
hotspot_threshold |
to determine hotspot genes, the (integer) threshold for the minimum number of cases with certain mutation in COSMIC (default = 5) |
log2_hom_loss_threshold |
to determine double-hit events, the log2 threshold for identifying homozygous loss events (default = -1). |
verbose |
boolean controlling verbosity (default = |
na.string |
string that was used to indicate when a score is not available during annotation with ANNOVAR (default = ".") |
If prep_phenolyzer_input=FALSE
(default), a data frame of
features for prioritizing cancer driver genes (gene_symbol
as
the first column and 26 other columns containing features). If
prep_phenolyzer_input=TRUE
, the functions returns a vector gene symbols
(union of all gene symbols for which scores are available) to be used as the
input for performing 'phenolyzer' analysis.
The features data frame contains the following columns:
HGNC gene symbol
the maximum metapredictor (coding) impact score for the gene
the maximum non-coding PHRED-scaled CADD score for the gene
SCNA proxy score. SCNA density (SCNA/Mb) of the minimal common region (MCR) in which the gene is located
boolean indicating whether the gene is a hotspot gene (indication of oncogenes) or subject to double-hit (indication of tumor-suppressor genes)
'phenolyzer' score for the gene
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
boolean indicating whether or not the gene takes part in this KEGG pathway
prioritize_driver_genes
for prioritizing cancer driver genes
path2annovar_csv <- system.file("extdata/example.hg19_multianno.csv",
package = "driveR")
path2phenolyzer_out <- system.file("extdata/example.annotated_gene_list",
package = "driveR")
features_df <- create_features_df(annovar_csv_path = path2annovar_csv,
scna_df = example_scna_table,
phenolyzer_annotated_gene_list_path = path2phenolyzer_out)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.