knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette provides an introduction to the MetaboSet
class, along with a summary of many of the functions for accessing elements of MetaboSet
objects and other utility functions.
MetaboSet
objects are the primary data structure of this package. MetaboSet
is built upon the ExpressionSet
class from the Biobase package by Bioconductor. ExpressionSet
is used to record gene expression data, but the structure of the class is easily adaptable to LC-MS. For more information, read the ExpressionSet documentation. MetaboSet
objects consist of three main parts, each a matrix or a data frame:
pData(object)
fData(object)
exprs(object)
In addition to these, a MetaboSet
can store the names of special columns in pData
that store group labels, time points or subject identifiers. These columns are used as defaults in many of the functions of the package.
group_col
holds the group column name time_col
holds the time point column name subject_col
holds the subject ID column nameLet's look at the three main parts in more detail:
The sample information data frame, or pData
has many special column names that are created when data is read from the Excel spreadsheet.
Sample_ID
holds sample identifiers and must be present and can be automatically created. It is often used to label samples in visualizations, and as column names of exprs
. Injection_order
holds the injection order of samples and must be present. It is used in drift correction and commonly in some visualizations for quality control. QC
tells whether a sample is a QC sample ("QC") or a biological sample ("Sample"). This column can be created automatically, as long as there is any column (usually a group column) that has the value "QC" for QC samples and something else for the biological samples. This column is used by many quality control functions.In addition to these three columns, pheno data often holds at least one of the group, time and subject ID columns that are defined separately. They are used as defaults by many functions for visualization and quality control.
All the information about individual molecular features is stored here. This includes information given by the peak picking software, and information added by functions in this package, such as quality metrics and results from statistical tests. The feature data part usually has many columns that are created by the peak picking software, but for the sake of this package, the most important are:
In addition, this package will automatically create new columns:
Split
, which is used to separate different parts of the dataset, usually made by combining column and ionization mode Feature_ID
, a unique identifier for each feature, made by combining Split, mass and retention time. Feature_ID
is used as the row names of exprs
and also present in results
Flag
, used to, well, flag low-quality features. Read more below:Flag
column is used to flag features that are deemed low-quality for some reason (see ?flag_detection
and ?flag_quality
). Many functions have an all_features
that controls whether all features or only the good quality features should be used for the function. By default, all_features
is always set to FALSE
, which means that all flagged features (features with a non-NA value in the Flag
column) are ignored.
Naturally, the abundance part, exprs
, is used by almost all the functions as it actually holds the data. Not much more to say here.
knitr::include_graphics("Data_input.png")
To construct a MetaboSet
object, you need to have the data read in R. This can be achieved with read_from_excel
function, which reads Excel spreadsheets in the format shown in the figure above. The first parameters include the file name, sheet number, and coordinates for the corner ("Ion Mode" in the above example), in which the three parts of the dataset come together. The row must be numeric, but the column can be given either as a number or a letter (or a combination of two letters), as that is how it's displayed in Excel.
Some fields in sample information and feature data have special purposes.
There are a few obligatory fields:
Additionally, there are a few special cases:
id_prefix
. The default prefix is "ID". split_by
parameter. Usually these columns are the LC column and Ionization mode. A new field "Split" will be added, that contains the combination of the columns given in the given order.name
argument. In this case, the "Split" field will equal "HILIC_pos" for all the features.The function returns a list holding the three parts of the data:
exprs
: feature abundances across the samples pheno_data
: sample information feature_data
: feature informationMetaboSet
objects are constructed with the construct_metabosets
function. The functions parameters include all the main parts of a MetaboSet
object. The special column names can also be set for this function. Note that the function returns a named list of MetaboSet
objects, where the feature data and abundances are split by the Split
column in feature data (most commonly this means the four modes are returned separately). The sample information and special column names are identical for each object.
These functions from Biobase might be useful:
fData
: access feature data pData
: access pheno data/sample information exprs
: access feature abundancesIn addition, MetaboSets can be subset using the syntax for ExpressionSets, namely:
exprs
part holding the abundances can be done with simple square brackets. For example, example_set[1:5, 1:3]
would get the first 5 features and the first 3 samples out of the example_set
object. pData
can be accessed via a shortcut, for example example_set$Group
example_set[, example_set$Group != "B""]
Utility functions for MetaboSet in particular:
group_col
, time_col
and subject_col
for the special column names quality
for the quality metrics of signals combined_data
for a data frame with sample information and feature abundances as columns (one row per sample).write_to_excel
writes the whole object to an Excel spreadsheetIn addition, there are many functions that modify MetaboSets:
mark_nas
can be used to mark missing values as NA
(peak picking software tend to report them as 0 or 1)mark_qcs
marks QC samples in pData columns where they have NA values drop_qcs
removes the QC samples drop_flagged
removes all features that have been flagged (low-quality features) merge_metabosets
is used to merge MetaboSets from multiple modes together join_fData
can be used to merge a data frame to the fData
part of an object. This is mainly used to add results from statistical tests.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.