knitr::opts_chunk$set(echo=TRUE)
All functions in the bigPint
package require an input parameter called data
, which should be a data frame that contains the full dataset of interest. If a researcher is using the package to visualize RNA-seq data, then this data
object should be a count table that contains the read counts for all genes of interest.
The data
object requires the same particular data frame format for all bigPint
functions. There should be $n$ rows in the data frame, where $n$ is the number of genes. There should be $p + 1$ columns in the data frame, where $p$ is the number of samples. The first column contains the genes names and the rest of the columns should contain the read counts for all samples of interest. An example of this format is shown below:
library(bigPint) data("soybean_ir_sub") head(soybean_ir_sub)
We can also examine the structure of an example data
object as follows:
str(soybean_ir_sub, strict.width = "wrap")
This example dataset contains 5,604 genes and six samples [@soybeanIR]. There are two treatment groups, N and P. Each treatment group contains three replicates.
As demonstrated above, the data
object must meet the following conditions:
data.frame
character
integer
or numeric
^[a-zA-Z0-9]+\\.[0-9]+
, whereIt is important that the names of all columns except the first follow the three-part format delineated above. All functions in the bigPint
package require this format to successfully produce plots. If your data
object does not fit this format, bigPint
will likely throw an informative
error about why your format was not recognized.
Note that the data
object can contain more than two treatment groups. In this case, the bigPint
software will automatically create plots for all pairs of treatment groups. An example of this type of dataset is provided in the bigPint
package and can accessed as follows:
data(soybean_cn_sub)
data(soybean_cn_sub) str(soybean_cn_sub, strict.width = "wrap")
This example dataset contains 7,332 genes and nine samples [@brown2015developmental]. There are three treatment groups, S1, S2, and S3. Each treatment group contains three replicates. In such cases where the data
object contains more than two treatment groups, all functions in the bigPint
package (except plotSMApp()
) will automatically produce a plot for each pairwise combination of treatment groups.
For example, bigPint
functions will produce plots for S1 versus S2, S1 versus S3, and S2 versus S3 in this case. The same could be accomplished (although less efficiently) by separating the dataset into three separate datasets and running a bigPint
function of interest on each of them individually.
library(dplyr) soybean_cn_sub_S1S2 <- soybean_cn_sub %>% select("ID", contains("S1"), contains("S2")) soybean_cn_sub_S1S3 <- soybean_cn_sub %>% select("ID", contains("S1"), contains("S3")) soybean_cn_sub_S2S3 <- soybean_cn_sub %>% select("ID", contains("S2"), contains("S3"))
head(soybean_cn_sub_S1S2, 3)
head(soybean_cn_sub_S1S3, 3)
head(soybean_cn_sub_S2S3, 3)
Some popular RNA-seq analysis packages (such as edgeR [@robinson2010edger], DESeq2 [@love2014moderated], and limma [@ritchie2015limma]) advise researchers to perform certain preprocessing steps to their data, such as filtering the genes, normalizing their read counts, and standardizing their read counts before visualization. Researchers can use datasets whether or not they have been filtered, normalized, and standardized for setting the data
object in the bigPint
package. If they wish, they can use bigPint
plots to investigate how their dataset changes after filters, normalizations, and standardizations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.