PlotTransBiasGeneExpToPdf: Plot transcription strand bias with respect to gene...

Description Usage Arguments Value Note Examples

View source: R/strandbias_functions.R

Description

Plot transcription strand bias with respect to gene expression values to a PDF file

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
PlotTransBiasGeneExpToPdf(
  annotated.SBS.vcf,
  file,
  expression.data,
  Ensembl.gene.ID.col,
  expression.value.col,
  num.of.bins,
  plot.type = c("C>A", "C>G", "C>T", "T>A", "T>C", "T>G"),
  damaged.base = NULL
)

Arguments

annotated.SBS.vcf

An SBS VCF annotated by AnnotateSBSVCF. It must have transcript range information added.

file

The name of output file.

expression.data

A data.table which contains the expression values of genes.
See GeneExpressionData for more details.

Ensembl.gene.ID.col

Name of column which has the Ensembl gene ID information in expression.data.

expression.value.col

Name of column which has the gene expression values in expression.data.

num.of.bins

The number of bins that will be plotted on the graph.

plot.type

A vector of character indicating types to be plotted. It can be one or more types from "C>A", "C>G", "C>T", "T>A", "T>C", "T>G". The default is to print all the six mutation types.

damaged.base

One of NULL, "purine" or "pyrimidine". This function allocates approximately equal numbers of mutations from damaged.base into each of num.of.bins bin by expression level. E.g. if damaged.base is "purine", then mutations from A and G will be allocated in approximately equal numbers to each expression-level bin. The rationale for the name damaged.base is that the direction of strand bias is a result of whether the damage occurs on a purine or pyrimidine. If NULL, the function attempts to infer the damaged.base based on mutation counts.

Value

A list whose first element is a logic value indicating whether the plot is successful. The second element is a named numeric vector containing the p-values printed on the plot.

Note

The p-values are calculated by logistic regression using function glm. The dependent variable is labeled "1" and "0" if the mutation from annotated.SBS.vcf falls onto the untranscribed and transcribed strand respectively. The independent variable is the binary logarithm of the gene expression value from expression.data plus one, i.e. log2 (x + 1) where x stands for gene expression value.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
file <- c(system.file("extdata/Strelka-SBS-vcf/",
                      "Strelka.SBS.GRCh37.s1.vcf",
                      package = "ICAMS"))
list.of.vcfs <- ReadAndSplitStrelkaSBSVCFs(file)
SBS.vcf <- list.of.vcfs$SBS.vcfs[[1]]             
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
  annotated.SBS.vcf <- AnnotateSBSVCF(SBS.vcf, ref.genome = "hg19",
                                      trans.ranges = trans.ranges.GRCh37)
  PlotTransBiasGeneExpToPdf(annotated.SBS.vcf = annotated.SBS.vcf, 
                            expression.data = gene.expression.data.HepG2, 
                            Ensembl.gene.ID.col = "Ensembl.gene.ID", 
                            expression.value.col = "TPM", 
                            num.of.bins = 4, 
                            plot.type = c("C>A","C>G","C>T","T>A","T>C"), 
                            file = file.path(tempdir(), "test.pdf"))
}

ICAMS documentation built on April 3, 2021, 5:07 p.m.