Allows the user to set and examine a variety of RnBeads global options. They affect the way in which the package computes and displays its results.
1 2 3
Option names as
Option name in the form of a
rnb.options() with no arguments returns a list with the current values of the options. To access the
value of a single option, one should use, e.g.,
rnb.getOption("filtering.greedycut"), rather than
rnb.options("filtering.greedycut") which is a list of length one. Also, only a limited set of options
is available (see below). Attempting to get or set the value of a non-existing option results in an error.
rnb.getOption, the current value for
rnb.options(), a list of all
RnBeads options and their current values. If option names are given, a list of all requested options
and their values. If option values are set,
rnb.options returns the previous values of the modified
Options used in RnBeads
charactervector storing a short title of the analysis. If specified, this name appears at the page title of every report.
Flag indicating if logging functionality is enabled in the automatic runs of the pipeline.
Email address associated with the analyses.
Genome assembly to be used. Currently only important for bisulfite mode. The supported genomes returned by the function
Flag indicating if analysis on site or probe level is to be conducted. Note that the preprocessing module always operates on the site level (only), regardless of the value of this option.
Region types to carry out analysis on, in the form of a
NULL(default value) signifies that all available region annotations (as returned by
rnb.region.types) are summarized upon loading and normalization, and the other modules analyze all regions summarized in the dataset. If this option is set to an empty vector, analysis on the region level is skipped.
Aggregation function to apply when calculating the methylation value for a region based on the values of the CpGs associated with that region. Accepted values for this function are
"coverage.weighted". The last method is applicable only for sequencing-based methylation datasets. It computes the weighted average of the values of the associated CpGs, whereby weights are calculated based on the coverages of the respective sites.
If a number larger than 1 is specified, RnBeads will subdivide each region specified in the
region.typesoption into subsegments containing on average
region.subsegmentssites per subsegment. This is done by clustering the sites within each regions according to their genomic coordinates. These subsegments are then used for subsequent analysis. Use cautiously as this will significantly increase the runtime of the pipeline.
The region types to which subsegmentation will be applied. Defaults to
region.typeswhen set to
Column name or index in the table of phenotypic information to be used when plotting sample identifiers. If this option is
NULL, it points to a non-existing column or a column that does not list IDs, the default identifiers are used. These are the row names of the sample phenotype table (and the column names of the beta value matrix).
charactervector of length 2 or more giving the color scheme for displaying categorical trait values in plots. RnBeads denotes missing values (
NA) by grey, therefore, it is not recommended to include shades of grey in this vector. The default value of this option is the result of the
"Dark2"palette of RColorBrewer with 8 values.
charactervector of length 2 or more giving the color scheme for displaying continuous (gradient) trait values in plots. RnBeads interpolates between the color values.
Minimum number of samples each subgroup defined by a trait, in order for this trait to be considered in the methylation profiles and in the differential methylation modules. This must be a positive
Maximum number of subgroups defined by a trait, in order for this trait to be considered in the methylation profiles and in the differential methylation modules. This must be an
2or more. As a special case, a value of
NULL(default) indicates that the maximum number of subgroups is the number of samples in an analysis minus
1, i.e. traits with all unique values will be ignored.
Column name in the sample annotation table that indicates sample replicates. Replicates are expected to contain the same value. Samples without replicates should contain unique or missing values. If this option is
NULL(default), replicate handling is disabled.
Flag indicating whether large output files should be compressed (in
Flag controlling whether data import report should be generated. This option be set to
FALSEonly when the provided data source is an object of type RnBSet, i.e. the data has been previously loaded by RnBeads.
Type of data assumed to be supplied by default (Infinium 450k microarray). For sequencing data set this to
bs.bed.dirand save the options. See
rnb.execute.importfor further details.
Separator used in the plain text data tables. See
Preset for bed-like formats.
"BisSNP", "Encode","EPP", "bismarkCytosine", "bismarkCov"are currently supported. See the RnBeads vignette and the FAQ section on the website for more details.
Column indices in the supplied BED file with DNA methylation information. These are represented by a named
integervector, in which the names are:
"t". These names correspond the columns for chromosome, start position, end position, strand, methylation degree, read coverage, number of reads with C and number of reads with T, respectively. Methylation degree and/or read coverage, if not specified, are inferred from the values in the columns
"t". Further details and examples of BED files can be found in Section 4.1 of the RnBeads vignette.
Singleton of type
integerspecifying the frame shift between the coordinates in the input BED file and the corresponding genomic reference. This (
integer) value is added to the coordinates from the BED file before matching the methylation sites to the annotated ones.
Perform a small loading test, by reading 1000 rows from each BED file, after which normal loading is performed. See RnBeads vignette and the FAQ section on the website for more details.
Perform only the small loading test, and skip loading all the data.
Skip the check of the loaded RnBSet object after loading. Helps with keeping the memory profile down
Flag indicating if gender prediction is to be performed. Gender prediction is only supported for Infinium 450k datasets with signal intensity information. The value of this option is ignored for other datasets.
Flag controlling whether the data should be preprocessed (whether quality filtering and in case of Infinium microarray data normalization should be applied).
Flag controlling whether the data should be normalized and normalization report generated. Setting this to
NULL(default) enables this step for analysis on Infinium datasets, but disables it in case of sequencing-based datasets. Note that normalization is never applied in sequencing datasets; if this flag is enabled, it will lead to a warning message.
Normalization method to be applied, or
"none". Multiple normalization methods are supported:
"illumina"- methylumi-implemented Illumina scaling normalization;
"swan"(default) - SWAN-normalization by Gordon et al., as implemented in minfi;
"bmiq"- beta-mixture quantile normalization method by Teschendorff et al; as well as
"wm.swan"- all normalization methods implemented in the wateRmelon package. When setting this option to a specific algorithm, make sure its dedicated package is installed.
A character singleton specifying which background subtraction is to be performed during normalization. The methylumi background correction methods are supported. The following values are accepted:
Flag indicating if the report on normalization should include plots of shifts (degrees of beta value correction).
Flag indicating if the quality control module is to be executed.
[Infinium 450k] Add boxplots for all types of quality control probes to the quality control report. The boxplots give signal distribution across samples.
[Infinium 450k] Add barplots for each quality control probes to the quality control report.
[Infinium 450k] Add boxplot of negative control probe intensities for all samples.
[Infinium 450k] Flag indicating if intersample distances based on the beta values of SNP probes are to be displayed. This can help identify or validate genetically similar or identical samples.
[Infinium 450k] Add boxplot of beta-values for the SNP-calling probes. Can be useful for detection of sample mix-ups.
[Infinium 450k] Add bar plots of beta-values for the SNP-calling probes in each profiled sample.
[Infinium 450k] Maximal number of samples included in a single quality control barplot and negative control boxplot.
[Bisulfite sequencing] Add genome-wide sequencing coverage plot for each sample.
[Bisulfite sequencing] Values for coverage cutoffs to be shown in a coverage thresholds plot. This must be an
integervector of positive values. Setting this to an empty vector disables the coverage thresholds plot.
[Bisulfite sequencing] Add sequencing coverage histogram for each sample.
[Bisulfite sequencing] Add sequencing coverage violin plot for each sample.
Name of a file specifying site or probe identifiers to be whitelisted. Every line in this file must contain exactly one identifier. The whitelisted sites are always retained in the analysed datasets, even if filtering criteria or blacklisting requires their removal. For Infinium studies, the file must contain Infinium probe identifiers. For bisulfite sequencing studies, the file must contain CpG positions in the form "chromosome:coordinate" (1-based coordinate of the cytosine), e.g.
chr2:48607772. Unknown identifiers are silently ignored.
Name of a file specifying site or probe identifiers to be blacklisted. Every line in this file must contain exactly one identifier. The blacklisted sites are removed from the analysed datasets as a first step in the preprocessing module. For Infinium studies, the file must contain Infinium probe identifiers. For bisulfite sequencing studies, the file must contain CpG positions in the form "chromosome:coordinate" (1-based coordinate of the cytosine), e.g.
chr2:48607772. Unknown identifiers are silently ignored.
charactervector giving the list of probe context types to be removed as a filtering step. Possible context values are
"Other". Probes in the second context measure CpG methylation; the last context denotes probes dedicated to SNP detection. Setting this option to
NULLor an empty vector effectively disables the step of context-specific probe removal.
Removal of sites or probes based on overlap with SNPs. The accepted values for this option are:
no SNP-based filtering;
filter out a probe when the last 3 bases in its target sequence overlap with SNP;
filter out a probe when the last 5 bases in its target sequence overlap with SNP;
filter out a CpG site or probe when any base in its target sequence overlaps with SNP.
Bisulfite sequencing datasets operate on sites instead of probes, therefore, the values
"5"are treated as
Flag indicating if the removal of potentially cross-reactive probes should be performed as a filtering step in the preprocessing module. A probes whose sequence maps to multiple genomic locations (allowing up to 3 mismatches) is cross-reactive.
Flag indicating if the Greedycut procedure should be run as a filtering step in the preprocessing module.
Threshold for the detection p-value to be used in Greedycut. This is a value between 0 and 1. This option has effect only when
Indicator of what the behaviour of Greedycut should be in case of ties between the scores of rows (probes) and columns (samples). The value of this option must be one of
"any"; the last one indicating random choice. This option has effect only when
Flag indicating if the removal of probes located on sex chromosomes should be performed as a filtering step.
Number between 0 and 1, indicating the fraction of allowed missing values per site. A site is filtered out when its methylation beta values are
NAs in a larger fraction of samples than this threshold. Setting this option to 1 (default) retains all sites, and thus effectively disables the missing value filtering step in the preprocessing module. If this is set to 0, all sites that contain missing values are filtered out.
Threshold for minimal acceptable coverage. This must be a non-negative value. Setting this option to 0 (zero) effectively considers any known or unknown read coverage for sufficiently deep.
Flag indicating whether methylation values for low coverage sites should be set to missing. In combination with
filtering.missing.value.quantilethis can lead to the removal of sites.
(Bisulfite sequencing mode) Flag indicating whether methylation sites with a coverage of more than 10 times the 95-percentile of coverage should be removed.
Threshold used to filter probes based on the variability of their assigned beta values. This must be a real value between 0 and 1, denoting minimum standard deviation of the beta values in one site across all samples. Any sites that have standard deviation lower than this threshold are filtered out. Note that sites with undetermined varibility, that is, sites for which there are no measurements (all beta values are
NAs), are retained. Setting this option to 0 (default) disables filtering based on methylation variability.
Flag indicating if the covariate inference analysis module is to be executed.
Column names in the sample annotation table for which surrogate variable analysis (SVA) should be conducted. An empty vector (default) means that SVA is skipped.
Column name in the sample annotation table giving the assignment of samples to reference methylomes. The target samples should have
NAvalues in this column.
Number of most variable CpGs which are tested for association with the reference cell types. Setting this option to
NULLforces the algorithm to use all available sites in the dataset, and may greatly increase the running time for cell type comoposition estimation.
Number of top cell type markers used for determining cell type contributions to the target DNA methylation profiles using the projection method of Houseman et al.
Name of the method to be used for estimating the number of surrogate variables. must be either 'leek' or 'be', See
svafunction for details.
Flag indicating if the exploratory analysis module is to be executed.
Traits, given as column names or indices in the sample annotation table, to be used in the exploratory analysis. These traits are used in multiple steps in the module: they are visualized using point types and colors in the dimension reduction plots; tested for strong correlations and associations with principal components in a methylation space; used to define groups when plotting beta distributions and/or inter-sample methylation variability. The default value of this parameter -
NULL- indicates that columns should be automatically selected; see
rnb.sample.groupsfor how this is done.
Number of most variable probes, sites or regions to select prior to performing dimension reduction techniques and tests for associations. Preselection can significantly reduce the running time and memory usage in the exploratory analysis module. Setting this number to zero (default) disables preselection.
Maximum number of principal components to be tested for associations with other factors, such as control probe states and sample traits. This must be an
10. Setting this option to
0disables such tests.
Significance threshold for a p-value resulting from applying a test for association. This is a value between 0 and 1.
Number of permutations in tests performed to check for associations between traits, and between control probe intensities and coordinates in the prinicipal component space. This must be a non-negative
integer. Setting this option to
0disables permutation tests.
[Infinium 450k] Flag indicating if quality-associated batch effects should be studied. This amounts to testing for associations between intensities of quality control probes and principal components. This option has effect only when
Flag indicating whether beta value distributions for sample groups and probe or site categories should be computed.
Flag indicating if methylation variability in sample groups should be computed as part of the exploratory analysis module.
Flag indicating if the inter-sample methylation variability step in the exploratory analysis module should include deviation plots. Deviation plots show intra-group methylation variability at the covered sites and regions. Setting this option to
NULL(default) enables deviation plots on Infinium datasets, but disables them in case of sequencing-based datasets, because their generation can be very computationally intensive. This option has effect only when
Which sites should be used by clustering algorithms in the exploraroty analysis module. RnBeads performs several algorithms that cluster the samples in the dataset. If this option is set to
"all"(default), clustering is performed using all sites; a value of
"top"indicates that only the most variable sites are used (see the option
Number of most variable sites to use when visualizing heatmaps. This must be a non-empty
integervector containing positive values. This option is ignored when
Flag indicating if the generated methylation value heatmaps in the clustering section of the exploratory analysis module should be saved as PDF files. Enabling this option is not recommended for large values of
exploratory.clustering.top.sites(more than 200), because heatmaps might generate very large PDF files.
Region types for generating regional methylation profiles. If
NULL(default), regional methylation profiles are created only for the region types that are available for the targeted assembly and summarized in the dataset of interest. Setting this option to an empty vector disables the region profiles step in the exploratory analysis module.
A list of gene symbols to be used for custom locus profiling. Locus views will be generated for these genes.
Path to a bed file containing custom genomic regions. Locus views will be generated for these regions.
Flag indicating if the differential methylation module is to be executed.
Method to be used for calculating p-values on the site level. Currently supported options are "ttest" for a (paired) t-test and "limma" for a linear modeling approach implemented in the
limmapackage for differential expression in microarrays.
Number of permutation tests performed to compute the p-value of rank permutation tests in the differential methylation analysis. This must be a non-negative
integer. Setting this option to
0(default) disables permutation tests for rank permutations. Note that p-values for differential methylation are computed and also considered for the ranking in any case.
Column names or indices in the table of the sample annotation table to be used for group definition in the differential methylation analysis. The default value -
NULL- indicates that columns should be automatically selected. See
rnb.sample.groupsfor how this is done. By default, the comparisons are done in a one vs. all manner if there are multiple groups defined in a column.
Column names or indices in the table of sample annotation table to be used for group definition in the differential methylation analysis in which all pairwise comparisons between groups should be conducted (the default is one vs all if multiple groups are specified in a column). Caution: for large numbers of sample groups this can lead to combinatorial explosion and thus to huge runtimes. A value of
NULL(default) indicates that no column is selected for all pairwise comparisons explicitely. If specified, the selected columns must be a subset of the columns that will be selected according to the
Column names or indices in the table of phenotypic information to be used for confounder adjustment in the differential methylation analysis. Currently this is only supported for
A NAMED vector containing for each column name for which paired analysis should be performed (say columnA) the name or index of another column (say columnB) in which same values indicate the same pairing. columnA should be the name of the value columnB in this vector. For more details see
Flag indicating if the differential methylation analysis should account for Surrogate Variables. If
TRUE, RnBeads looks for overlaps between the
inference.targets.svaoptions and include the surrogate variables as confounding factors only for these columns. In other words, it will only have an effect if the corresponding inference option (see
inference.targets.svaoption for details) is enabled. Currently this is only supported for
Should the differential methylation analysis account for celltype using the reference based Houseman method. It will only have an effect if the corresponding inference option is enabled (see
inference.reference.methylome.columnoption for details). Currently this is only supported for
Flag indicating whether Gene Ontology (GO)-enrichment analysis is to be conducted on the identified differentially methylated regions.
Flag indicating whether a section corresponding to differential site methylation should be added to the report. Has no effect on the actual analysis, just the report. To disable differential site methylation analysis entirely use the
Flag indicating whether the data should be exported to bed files.
charactervector specifying which data types should be exported to Track hub directories. Possible values in the vector are
"bigWig". When this options is set to
NULL, track hub export is disabled. Note that if
"bigBed"is contained in this option, bed files are created automatically.
Flag indicating whether methylation value matrices are to be exported to comma-separated value (CSV) files.
Flag indicating whether methylation values and differential methylation analysis settings should be exported to a format compatible with FaST-LMM-EWASher, a tool for adjusting for cell-type compositions. See Zou, J., et al., Nature Methods, 2014 for further details on the tool.
charactervector of sites and region names to be exported. If
NULL, no region methylation values are exported.
Flag indicating whether big tables should be stored on disk rather than in main memory in order to keep memory requirements down. May slow down analysis!
Flag indicating if the active R session should be terminated when an error is encountered during execution.
When plotting methylation value distributions, this threshold specifies the number of observations drawn per group. Distributions are estimated and plotted based on these random subsamples. This approach can significantly reduce the memory requirements of the preprocessing and exploratory analysis modules, where methylation value distributions are plotted. Setting this to
0disables subsampling. More information is presented the Details section of
Flag indicating whether in some places of the code memory management should actively being enforced in order to achieve a better memory profile. I.e. garbage collection, variable removal is conducted actively. May slow down analysis.
Flag indicating whether disked dumped big matrices (see
disk.dump.big.matricesoption) should actively be deleted when RnBSets are modified. You should switch it to
TRUEand the amount of hard drive space is also limited.