knitr::opts_chunk$set(echo = TRUE) options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set(fig.width = 8) knitr::opts_chunk$set(fig.height = 6)
The pmartR
package has been designed around omicsData objects. These are S3 object classes defined for explicit use within this package. Current omicsData classes supported are:
pepData: for unlabeled peptide data, typically generated via LC-MS/MS
isobaricpepData: for labeled peptide data, generated via iTRAQ or TMT labeling
proData: protein level data, often created from within a pmartR workflow by "rolling up" the peptide level data
metabData: metabolite data, often generated by GC-MS or HILIC
nmrData: metabolite data generated by NMR
lipidData: lipid data, often generated via LC-MS
seqData: sequence data, such as RNAseq
These objects are structured as lists, with 2 required and 1 optional component (each component is a data frame). As various pmartR
functions are called on the data objects, attributes are added to them, and utilized behind the scenes to help ensure proper order of operations and usage of methods. The components of an omicsData object are as follows:
pep_edata
: $p * (n + 1)$ data frame of expression data, where $p$ is the number of biomolecules observed and $n$ is the number of samples (an additional biomolecule identifier/name column should also be present anywhere in the data frame). Each row corresponds to data for each biomolecule.
pep_fdata
: data frame with $n$ rows. Each row corresponds to a sample with one column giving the unique sample identifiers found in e_data column names and other columns providing qualitative and/or quantitative traits of each sample.
pep_emeta
: optional data frame with at least $p$ rows. Each row corresponds to a biomolecule with one column giving biomolecule names (must be named the same as the column in e_data
) and other columns giving meta information (e.g. mappings of peptides to proteins or lipids to lipid classes).
The first step when analyzing data with pmartR
is to load the libary and create an omicsData object of the appropriate type.
library(pmartR)
The pmartRdata
package is a companion package to pmartR
that contains a number of example datasets leveraged throughout the pmartR
documentation.
This vignette will utilize the lipid example data from negative ionization mode. An example of creating an isobaricpepData object is included in the "Typical Processing Workflow" vignette.
# install the pmartRdata package, if needed # devtools::install_github("pmartR/pmartRdata") # load the pmartRdata package library(pmartRdata) # load example e_data, f_data, e_meta for the lipid negative ionization mode dataset edata <- lipid_neg_edata fdata <- lipid_neg_fdata emeta <- lipid_neg_emeta
This required data frame contains the measurements for each sample and biomolecule. Oftentimes the first column contains the biomolecule names, although any column can contain this information. Remaining columns correspond to the samples, and the columns names for the samples must match the sample names in f_data. Each row corresponds to a biomolecule.
For our example negative ionization mode lipid data, the e_data data frame looks like this:
head(edata)
This required data frame contains information about each sample. One column must contain the sample names, which are identical to the column names in e_data that correspond to the samples. Other information that may be of use to store in f_data includes:
Experimental group(s) to which the sample belongs
Other phenotypic information (e.g. age or sex, time of sampling if relevant to the experimental design) or properties of the samples (e.g. sample weight or concentration)
Run order for the samples
Batch number, if samples were run in multiple batches
The type and amount of information to include in f_data depends on the experiment and the researcher. It is okay to include extra information that does not get used in the pmartR
analysis pipeline.
For our example negative ionization mode lipid data, the f_data data frame contains the sample identifier, virus strain, replicate, and donor.
head(fdata)
Note that the entries in the SampleID column can be mapped one-to-one to the column names of e_data (excluding the biomolecule identifier column). The entries do not have to be in the same order.
all(fdata$SampleID %in% names(edata)[-which(names(edata) == "Lipid")]) all(names(edata)[-which(names(edata) == "Lipid")] %in% fdata$SampleID)
This optional data frame can contain any metadata associated with the biomolecules. One column must contain the same biomolecule identifiers (and the same column name) as the biomolecule identifier name in e_data. For peptide data that will be rolled up to the protein level, e_meta should be present and contain the peptide to protein mapping. For lipid data it can be useful to include the mapping of lipids to various lipid classes. Metabolites could be mapped to other identifiers (KEGG, InChI key, etc.).Each row corresponds to a biomolecule.
For our example negative ionization mode lipid data, the e_meta data frame looks like this:
head(emeta)
Note that all of the entries in the e_data biomolecule column are found in the e_meta biomolecule column. Here were are not mapping to lipid classes, but are recording the Row and Retention Time values for each lipid.
all(edata$Lipid %in% emeta$Lipid)
To create the lipidData object with our example data, we need the 3 data frames and some additional information about the data contained therein:
edata_cname: column name for biomolecule identifier column in e_data data frame
fdata_cname: column name for sample identifier column in f_data data frame
emeta_cname: column name for biomolecule identifier column in e_meta data frame (can be the same as the edata_cname, if we are not trying to roll peptides up to protein level)
data_scale: is the data on the abundance scale, has it already been log2 or log10 or log transformed, or for seqData objects this is "count"
data_types: optional argument for additional information about the data type; often used if there are datasets for both negative and positive ionization modes on an instrument
mylipid <- as.lipidData( e_data = edata, f_data = fdata, e_meta = emeta, edata_cname = "Lipid", fdata_cname = "SampleID", emeta_cname = "Lipid", data_scale = "abundance", data_types = "Negative Ion" )
Now we have an object of class lipidData. Built-in summary and plot methods now operate on this object and provide additional information.
class(mylipid) summary(mylipid) plot(edata_transform(mylipid, data_scale = "log2"))
See "Quality_Control_with_pmartR" vignette for next steps.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.