View source: R/RNAseqCNV_wrappper.R
RNAseqCNV_wrapper | R Documentation |
Wrapper for generating figures and tables for CNV estimation from RNA-seq
RNAseqCNV_wrapper(
config,
metadata,
snv_format,
adjust = TRUE,
arm_lvl = TRUE,
estimate_lab = TRUE,
genome_version = "hg38",
gene_annotation = NULL,
SNP_to_keep = NULL,
par_regions = NULL,
centromeric_regions = NULL,
weight_tab = weight_table,
generate_weights = FALSE,
model_gend = model_gender,
model_dip = model_dipl,
model_alter = model_alt,
model_alter_noSNV = model_noSNV,
batch = FALSE,
standard_samples = NULL,
CNV_matrix = FALSE,
scale_cols = scaleCols,
dpRatioChromEdge = dpRatioChrEdge,
minDepth = 20,
mafRange = c(0.05, 0.9),
minReadCnt = 3,
samp_prop = 0.8,
weight_samp_prop = 1
)
config |
path to R script assigning paths to needed directories into variables: 1. count_dir - path to a directory with count files, 2. snv_dir - path to a directory with files with snv information (either vcf or custom tabular data), out_dir - path to an output directory. More detailed description can be found in the package README file. |
metadata |
path to a metadata table with three columns. First colum: sample names, second column: file names of count files, third column: file names of snv files. There should be no header. More information is included in the package README file. |
snv_format |
character string, either "vcf" or "custom". "vcf" argument should be used when vcf files with snv information are generated with GATK. Otherwise "custom" arguments can be used when input with snv iformation has 4 required columns: chromosome, locus of the snv, overall sequencing depth of the locus and MAF. MAF is the proportion of alternative allele sequencing depth to overall sequencing depth of the locus. |
adjust |
logical value, If TRUE, expression is centered according to the random forest estimated diploid chromosomes. Default = TRUE. |
arm_lvl |
logical value, If TRUE, arm_lvl figures will be printed (increases run-time significantly). Defaul = TRUE. |
estimate_lab |
logical value, If TRUE, CNV estimation labels will be included in the final figure. |
genome_version |
character string, either "hg19" or "hg38" (default). The gene annotation, kept SNPs, pseudoautosomal regions and centromeric regions will be selected accordingly to the the chosen version. If the information is supplied by the user by any of these arguments - referData, keptSNP, par_region, centr_refer, the internal data will be overwritten. |
gene_annotation |
table, reference data for gene annotation with ensamble ids |
SNP_to_keep |
vector of realiable SNPs to keep for the MAF graphs |
par_regions |
table with pseudoautosomal regions. These regions will be filtered out. |
centromeric_regions |
table with chromosomal centromeric locations. |
weight_tab |
table with per-gene weight for calculating weighted quantiles for the boxplots in the main figure. |
generate_weights |
logical value, if TRUE, weights for calculating weighted quantiles will be contructed from variance and depth of the analyzed cohort of samples. If batch is TRUE, the weights will be analyzed from the batch of input samples, if FALSE the weight will be generate from joined diploid standard and analyzed sample. |
model_gend |
random forest model for estimating gender based on the expression of certain genes on chromosome Y. |
model_dip |
random forest model for estimating whether chromosome arm is diploid. |
model_alter |
random forest model for estimating the CNVs on chromosome arm. |
model_alter_noSNV |
random forest model for estimating CNVs on chromosome arm level in case not enough SNV information is available to conctruct MAF density curve. |
batch |
logical value, if TRUE, the samples will be normalized together as a batch, also gene expression median will be calculated from these samples |
standard_samples |
character vector with sample names of samples which should be used as a standard for vst and log2 fold centering. The samples names must be included in the metadata table and batch analysis cannot be TRUE. If NULL (default), in-build standard samples will be used. |
CNV_matrix |
logical value, if TRUE, additional matrix of called CNVs for all analyzed samples will be output |
scale_cols |
colour scaling for box plots according to the median of a boxplot. |
minDepth |
minimal depth of of SNV to be kept (default 20). |
mafRange |
numeric value, two numerical values specifying the range of MAFs of SNVs to be kept (default c(0.05, 0.9)) |
minReadCnt |
numeric value value used for filtering genes with low expression according to to formula: at least samp_prop*100 percent of samples have more reads than minReadCnt. (default 3) |
samp_prop |
sample proportion which is required to have at least minReadCnt reads for a gene. The samples inlcude the diploid reference (from standard_samples parameter) and analyzed sample. (default 0.8) |
weight_samp_prop |
proportion of samples with highest weight to be kept. default (1) |
dpRationChromEdge |
table with chromosome start and end base positions. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.