RNAseqCNV_wrapper: RNAseqCNV_wrapper

View source: R/RNAseqCNV_wrappper.R

RNAseqCNV_wrapperR Documentation

RNAseqCNV_wrapper

Description

Wrapper for generating figures and tables for CNV estimation from RNA-seq

Usage

RNAseqCNV_wrapper(
  config,
  metadata,
  snv_format,
  adjust = TRUE,
  arm_lvl = TRUE,
  estimate_lab = TRUE,
  genome_version = "hg38",
  gene_annotation = NULL,
  SNP_to_keep = NULL,
  par_regions = NULL,
  centromeric_regions = NULL,
  weight_tab = weight_table,
  generate_weights = FALSE,
  model_gend = model_gender,
  model_dip = model_dipl,
  model_alter = model_alt,
  model_alter_noSNV = model_noSNV,
  batch = FALSE,
  standard_samples = NULL,
  CNV_matrix = FALSE,
  scale_cols = scaleCols,
  dpRatioChromEdge = dpRatioChrEdge,
  minDepth = 20,
  mafRange = c(0.05, 0.9),
  minReadCnt = 3,
  samp_prop = 0.8,
  weight_samp_prop = 1
)

Arguments

config

path to R script assigning paths to needed directories into variables: 1. count_dir - path to a directory with count files, 2. snv_dir - path to a directory with files with snv information (either vcf or custom tabular data), out_dir - path to an output directory. More detailed description can be found in the package README file.

metadata

path to a metadata table with three columns. First colum: sample names, second column: file names of count files, third column: file names of snv files. There should be no header. More information is included in the package README file.

snv_format

character string, either "vcf" or "custom". "vcf" argument should be used when vcf files with snv information are generated with GATK. Otherwise "custom" arguments can be used when input with snv iformation has 4 required columns: chromosome, locus of the snv, overall sequencing depth of the locus and MAF. MAF is the proportion of alternative allele sequencing depth to overall sequencing depth of the locus.

adjust

logical value, If TRUE, expression is centered according to the random forest estimated diploid chromosomes. Default = TRUE.

arm_lvl

logical value, If TRUE, arm_lvl figures will be printed (increases run-time significantly). Defaul = TRUE.

estimate_lab

logical value, If TRUE, CNV estimation labels will be included in the final figure.

genome_version

character string, either "hg19" or "hg38" (default). The gene annotation, kept SNPs, pseudoautosomal regions and centromeric regions will be selected accordingly to the the chosen version. If the information is supplied by the user by any of these arguments - referData, keptSNP, par_region, centr_refer, the internal data will be overwritten.

gene_annotation

table, reference data for gene annotation with ensamble ids

SNP_to_keep

vector of realiable SNPs to keep for the MAF graphs

par_regions

table with pseudoautosomal regions. These regions will be filtered out.

centromeric_regions

table with chromosomal centromeric locations.

weight_tab

table with per-gene weight for calculating weighted quantiles for the boxplots in the main figure.

generate_weights

logical value, if TRUE, weights for calculating weighted quantiles will be contructed from variance and depth of the analyzed cohort of samples. If batch is TRUE, the weights will be analyzed from the batch of input samples, if FALSE the weight will be generate from joined diploid standard and analyzed sample.

model_gend

random forest model for estimating gender based on the expression of certain genes on chromosome Y.

model_dip

random forest model for estimating whether chromosome arm is diploid.

model_alter

random forest model for estimating the CNVs on chromosome arm.

model_alter_noSNV

random forest model for estimating CNVs on chromosome arm level in case not enough SNV information is available to conctruct MAF density curve.

batch

logical value, if TRUE, the samples will be normalized together as a batch, also gene expression median will be calculated from these samples

standard_samples

character vector with sample names of samples which should be used as a standard for vst and log2 fold centering. The samples names must be included in the metadata table and batch analysis cannot be TRUE. If NULL (default), in-build standard samples will be used.

CNV_matrix

logical value, if TRUE, additional matrix of called CNVs for all analyzed samples will be output

scale_cols

colour scaling for box plots according to the median of a boxplot.

minDepth

minimal depth of of SNV to be kept (default 20).

mafRange

numeric value, two numerical values specifying the range of MAFs of SNVs to be kept (default c(0.05, 0.9))

minReadCnt

numeric value value used for filtering genes with low expression according to to formula: at least samp_prop*100 percent of samples have more reads than minReadCnt. (default 3)

samp_prop

sample proportion which is required to have at least minReadCnt reads for a gene. The samples inlcude the diploid reference (from standard_samples parameter) and analyzed sample. (default 0.8)

weight_samp_prop

proportion of samples with highest weight to be kept. default (1)

dpRationChromEdge

table with chromosome start and end base positions.


honzee/RNAseqCNV documentation built on Jan. 30, 2024, 7:07 p.m.