previewDipDistribution: Plots the distribution of the gene set's dip values and dip...

Description Usage Arguments Value Author(s) References Examples

Description

Allows the viewing of where the dataset's dip values and dip p-values are distributed to aid in the assignment of region break lines and the number of regions.

Usage

1
previewDipDistribution(RNAdata, rawRNAdata, minimumCounts, barLine, adjustVal)

Arguments

RNAdata

This argument specifies the RNAseq dataset to be analyzed. It may be in the form of either raw or normalized data. If the analysis is to involve a 'minimumCounts' screen to filter out low-expression genes, then 'RNAdata' should specify the normalized expression data. The rows must be genes (with gene names as row names) and the columns must be different samples (ideally, with the sample names as the column names, but this specific exemption will not disable the program). Columns with non-numerical data (or containing data not relating to a sample) should be specifically exempted before any analysis is attempted.

rawRNAdata

This is an optional argument used only when a 'minimumCounts' filter is to be applied. Each gene's highest expression level is extracted from 'rawRNAdata'. If that maximum expression does not exceed the value supplied by 'minimumCounts', then that gene will be exempted from the analysis of the normalized counts. It is crucial to note that this is not the dataset to be analyzed. This set serves as part of an optional filter. The rows must be genes (with gene names as row names) and the columns must be the samples, both of which should correspond directly with the rows and columns of the normalized data supplied as 'RNAdata'. Columns with non-numerical data (or containing data not relating to a sample) should be specifically exempted before any analysis is attempted.

minimumCounts

If 'rawRNAdata' is supplied, 'minimumCounts' is the threshold that each gene's maximum raw expression value must exceed to remain in the normalized RNA data for the analysis.

barLine

This selects the x-intercept of the bar that can be overlaid on the graph. It will default to 0 if no value is supplied.

adjustVal

ggplot2 value for adjusting the sharpness of the resulting plots: ranges from 0 to 1. The default is 1.

Value

This function returns a dataframe (DF) with the each gene's dip values and dip p-values, a density plot of the genes' dip values (DipPlot), and a density plot of the genes' dip p-values (PvalPlot).

Author(s)

Software authors: Jeremy Sieker, Sohyon Lee, Kristin Baldwin

References

Martin Maechler (2016). diptest: Hartigan's Dip Test Statistic for Unimodality - Corrected. R package version 0.75-7. https://CRAN.R-project.org/package=diptest

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
x <- paste("https://www.ebi.ac.uk/gxa/experiments-content",
  "/E-GEOD-70484/resources/BaselineProfilesWriterService.RnaSeq/tsv", sep = "")
ss <- read.table(url(x),sep = '\t', header = TRUE)
ss <- ss[(as.numeric(row.names(ss)) %% 2 == 1),]
#if the previous line fails, remove the slashes from the modulo division and try again
row.names(ss) <- ss$Gene.ID #cutting it in half to speed up the examples
ss <- ss[,-c(1:2)] #removing everything that is not expression data

mod <- previewDipDistribution(RNAdata = ss)


#in many datasets, genes with extremely low dip values (or high dip p values)
#will appear in their expression plots as normal distributions with a mean around zero.
#These are typically just genes that don't have registered counts in any of the samples.
#To remove these genes, there are a few options.
#One can apply both normalized and raw counts (i.e.- supplying both the RNAdata
#and rawRNA arguments) and employ the 'minimumCounts' filter.
#Alternatively, pre-filter your data for genes that do not pass
# your desired expression threshold, then simply
#use that data for your RNAdata argument and leave the rawRNAdata and
# minimumCounts arguments blank.

#Due to the difficulty of finding publicly available datasets that
# have paired raw and normalized counts, the filter will not be
#demonstrated in this example. However, if you were to have a raw counts set
# called raws and a normalized counts set called ss, the
#code would be along the lines of

#mod1 <- previewDipDistribution(RNAdata = ss, rawRNAdata = raws, minimumCounts = 50)

jsieker/DipEx documentation built on May 17, 2019, 2:10 p.m.