View source: R/create-gene-binary.R
create_gene_binary | R Documentation |
Enables creation of a binary matrix from a mutation, fusion or CNA file with a predefined list of samples (rows are samples and columns are genes)
create_gene_binary(
samples = NULL,
mutation = NULL,
mut_type = c("omit_germline", "somatic_only", "germline_only", "all"),
snp_only = FALSE,
include_silent = FALSE,
fusion = NULL,
cna = NULL,
high_level_cna_only = FALSE,
specify_panel = "no",
recode_aliases = "impact"
)
samples |
a character vector specifying which samples should be included in the resulting data frame. Default is NULL is which case all samples with an alteration in the mutation, cna or fusions file will be used. If you specify a vector of samples that contain samples not in any of the passed genomic data frames, 0's (or NAs when appropriate if specifying a panel) will be returned for every column of that patient row. |
mutation |
A data frame of mutations in the format of a maf file. |
mut_type |
The mutation type to be used. Options are "omit_germline", "somatic_only", "germline_only" or "all". Note "all" will keep all mutations regardless of status (not recommended). Default is omit_germline which includes all somatic mutations, as well as any unknown mutation types (most of which are usually somatic) |
snp_only |
Boolean to rather the genetics events to be kept only to be SNPs (insertions and deletions will be removed). Default is FALSE. |
include_silent |
Boolean to keep or remove all silent mutations. TRUE keeps, FALSE removes. Default is FALSE. |
fusion |
A data frame of fusions. If not NULL the outcome will be added to the matrix with columns ending in ".fus". Default is NULL. |
cna |
A data frame of copy number alterations. If inputed the outcome will be added to the matrix with columns ending in ".del" and ".amp". Default is NULL. |
high_level_cna_only |
If TRUE, only deep deletions (-2, -1.5) or high level amplifications (2) will be counted as events
in the binary matrix. Gains (1) and losses (1) will be ignored. Default is |
specify_panel |
Default is |
recode_aliases |
Default is |
a data frame with sample_id and alteration binary columns with values of 0/1
specify_panel
argument If specify_panel = "no"
is passed (default) data will be returned as is without any additional NA annotations.
If a single panel id is passed (e.g. specify_panel = "IMPACT468"
), all genes in your data that are not tested on that panel will be set to
NA
in results for all samples (see gnomeR::gene_panels to see which genes are on each supported panels).
If specify_panel = "impact"
is passed, impact panel version will be inferred based on each sample_id (based on IMX
nomenclature) and NA's will be
annotated accordingly for each sample/panel pair.
If you wish to specify different panels for each sample, pass a data frame (with all samples included) with columns: sample_id
, and panel_id
. Each sample will be
annotated with NAs according to that specific panel. If a sample in your data is missing from the sample_id
column in the
specify_panel
dataframe, it will be returned with no annotation (equivalent of setting it to "no").
recode_aliases
argument If recode_aliases = "impact"
is passed (default), function will use gnomeR::impact_alias_table
to find and replace any non-standard hugo symbol names with their
more common (or more recent) accepted gene name.
If recode_aliases = "genie"
is passed, function will use gnomeR::genie_alias_table
to find and replace any non-standard hugo symbol names with their
more common (or more recent) accepted gene name.
If recode_aliases = "no"
is passed, data will be returned as is without any alias replacements.
If you have a custom table of vetted aliases you wish to use, you can pass a data frame with columns: hugo_symbol
, and alias
.
Each row should have one gene in the hugo_symbol
column indicating the accepted gene name, and one gene in the alias
column indicating an alias
you want to check for and replace. If a gene has multiple aliases to check for, each should be represented in its own separate row.
See gnomeR::impact_alias_table
for an example of accepted data formatting.
mut.only <- create_gene_binary(mutation = gnomeR::mutations)
samples <- gnomeR::mutations$sampleId
bin.mut <- create_gene_binary(
samples = samples, mutation = gnomeR::mutations,
mut_type = "omit_germline", snp_only = FALSE,
include_silent = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.