RnBeads Options

Description

Allows the user to set and examine a variety of RnBeads global options. They affect the way in which the package computes and displays its results.

Usage

1
2
3
rnb.options(...)

rnb.getOption(x)

Arguments

...

Option names as characters, or new option values given in the form name = value.

x

Option name in the form of a character vector of length 1.

Details

Invoking rnb.options() with no arguments returns a list with the current values of the options. To access the value of a single option, one should use, e.g., rnb.getOption("filtering.greedycut"), rather than rnb.options("filtering.greedycut") which is a list of length one. Also, only a limited set of options is available (see below). Attempting to get or set the value of a non-existing option results in an error.

Value

For rnb.getOption, the current value for x. For rnb.options(), a list of all RnBeads options and their current values. If option names are given, a list of all requested options and their values. If option values are set, rnb.options returns the previous values of the modified options, invisibly.

Options used in RnBeads

analysis.name = NULL

One-element character vector storing a short title of the analysis. If specified, this name appears at the page title of every report.

logging = TRUE

Flag indicating if logging functionality is enabled in the automatic runs of the pipeline.

email = NULL

Email address associated with the analyses.

assembly = "hg19"

Genome assembly to be used. Currently only important for bisulfite mode. The supported genomes returned by the function rnb.get.assemblies.

analyze.sites = TRUE

Flag indicating if analysis on site or probe level is to be conducted. Note that the preprocessing module always operates on the site level (only), regardless of the value of this option.

region.types = NULL

Region types to carry out analysis on, in the form of a character vector. NULL (default value) signifies that all available region annotations (as returned by rnb.region.types) are summarized upon loading and normalization, and the other modules analyze all regions summarized in the dataset. If this option is set to an empty vector, analysis on the region level is skipped.

region.aggregation = "mean"

Aggregation function to apply when calculating the methylation value for a region based on the values of the CpGs associated with that region. Accepted values for this function are "min", "max", "mean" (default), "median", "sum", "coverage.weighted". The last method is applicable only for sequencing-based methylation datasets. It computes the weighted average of the values of the associated CpGs, whereby weights are calculated based on the coverages of the respective sites.

region.subsegments = 0

If a number larger than 1 is specified, RnBeads will subdivide each region specified in the region.types option into subsegments containing on average region.subsegments sites per subsegment. This is done by clustering the sites within each regions according to their genomic coordinates. These subsegments are then used for subsequent analysis. Use cautiously as this will significantly increase the runtime of the pipeline.

region.subsegments.types = NULL

The region types to which subsegmentation will be applied. Defaults to region.types when set to NULL.

identifiers.column = NULL

Column name or index in the table of phenotypic information to be used when plotting sample identifiers. If this option is NULL, it points to a non-existing column or a column that does not list IDs, the default identifiers are used. These are the row names of the sample phenotype table (and the column names of the beta value matrix).

colors.category = c("#1B9E77","#D95F02",...)

character vector of length 2 or more giving the color scheme for displaying categorical trait values in plots. RnBeads denotes missing values (NA) by grey, therefore, it is not recommended to include shades of grey in this vector. The default value of this option is the result of the "Dark2" palette of RColorBrewer with 8 values.

colors.gradient = c("#132B43","#56B1F7")

character vector of length 2 or more giving the color scheme for displaying continuous (gradient) trait values in plots. RnBeads interpolates between the color values.

min.group.size = 2

Minimum number of samples each subgroup defined by a trait, in order for this trait to be considered in the methylation profiles and in the differential methylation modules. This must be a positive integer.

max.group.count = NULL

Maximum number of subgroups defined by a trait, in order for this trait to be considered in the methylation profiles and in the differential methylation modules. This must be an integer of value 2 or more. As a special case, a value of NULL (default) indicates that the maximum number of subgroups is the number of samples in an analysis minus 1, i.e. traits with all unique values will be ignored.

replicate.id.column = NULL

Column name in the sample annotation table that indicates sample replicates. Replicates are expected to contain the same value. Samples without replicates should contain unique or missing values. If this option is NULL (default), replicate handling is disabled.

gz.large.files = FALSE

Flag indicating whether large output files should be compressed (in .gz format).

import = TRUE

Flag controlling whether data import report should be generated. This option be set to FALSE only when the provided data source is an object of type RnBSet, i.e. the data has been previously loaded by RnBeads.

import.default.data.type = "infinium.idat.dir"

Type of data assumed to be supplied by default (Infinium 450k microarray). For sequencing data set this to bs.bed.dir and save the options. See rnb.execute.import for further details.

import.table.separator = ","

Separator used in the plain text data tables. See rnb.execute.import for details.

import.bed.style = "BisSNP"

Preset for bed-like formats. "BisSNP", "Encode","EPP", "bismarkCytosine", "bismarkCov" are currently supported. See the RnBeads vignette and the FAQ section on the website for more details.

import.bed.columns

Column indices in the supplied BED file with DNA methylation information. These are represented by a named integer vector, in which the names are: "chr", "start", "end", "strand", "meth", "coverage", "c" and "t". These names correspond the columns for chromosome, start position, end position, strand, methylation degree, read coverage, number of reads with C and number of reads with T, respectively. Methylation degree and/or read coverage, if not specified, are inferred from the values in the columns "c" and "t". Further details and examples of BED files can be found in Section 4.1 of the RnBeads vignette.

import.bed.frame.shift = 1

Singleton of type integer specifying the frame shift between the coordinates in the input BED file and the corresponding genomic reference. This (integer) value is added to the coordinates from the BED file before matching the methylation sites to the annotated ones.

import.bed.test = TRUE

Perform a small loading test, by reading 1000 rows from each BED file, after which normal loading is performed. See RnBeads vignette and the FAQ section on the website for more details.

import.bed.test.only = FALSE

Perform only the small loading test, and skip loading all the data.

import.skip.object.check = FALSE

Skip the check of the loaded RnBSet object after loading. Helps with keeping the memory profile down

import.gender.prediction = TRUE

Flag indicating if gender prediction is to be performed. Gender prediction is only supported for Infinium 450k datasets with signal intensity information. The value of this option is ignored for other datasets.

preprocessing = TRUE

Flag controlling whether the data should be preprocessed (whether quality filtering and in case of Infinium microarray data normalization should be applied).

normalization = NULL

Flag controlling whether the data should be normalized and normalization report generated. Setting this to NULL (default) enables this step for analysis on Infinium datasets, but disables it in case of sequencing-based datasets. Note that normalization is never applied in sequencing datasets; if this flag is enabled, it will lead to a warning message.

normalization.method = "swan"

Normalization method to be applied, or "none". Multiple normalization methods are supported: "illumina" - methylumi-implemented Illumina scaling normalization; "swan" (default) - SWAN-normalization by Gordon et al., as implemented in minfi; "bmiq" - beta-mixture quantile normalization method by Teschendorff et al; as well as "wm.dasen", "wm.nasen", "wm.betaqn", "wm.naten", "wm.nanet", "wm.nanes", "wm.danes", "wm.danet", "wm.danen", "wm.daten1", "wm.daten2", "wm.tost", "wm.fuks" and "wm.swan" - all normalization methods implemented in the wateRmelon package. When setting this option to a specific algorithm, make sure its dedicated package is installed.

normalization.background.method = "methylumi.noob"

A character singleton specifying which background subtraction is to be performed during normalization. The methylumi background correction methods are supported. The following values are accepted: "none", "methylumi.noob", "methylumi.goob" and "methylumi.lumi".

normalization.plot.shifts = TRUE

Flag indicating if the report on normalization should include plots of shifts (degrees of beta value correction).

qc = TRUE

Flag indicating if the quality control module is to be executed.

qc.boxplots = TRUE

[Infinium 450k] Add boxplots for all types of quality control probes to the quality control report. The boxplots give signal distribution across samples.

qc.barplots = TRUE

[Infinium 450k] Add barplots for each quality control probes to the quality control report.

qc.negative.boxplot = TRUE

[Infinium 450k] Add boxplot of negative control probe intensities for all samples.

qc.snp.distances = TRUE

[Infinium 450k] Flag indicating if intersample distances based on the beta values of SNP probes are to be displayed. This can help identify or validate genetically similar or identical samples.

qc.snp.boxplot = FALSE

[Infinium 450k] Add boxplot of beta-values for the SNP-calling probes. Can be useful for detection of sample mix-ups.

qc.snp.barplot = FALSE

[Infinium 450k] Add bar plots of beta-values for the SNP-calling probes in each profiled sample.

qc.sample.batch.size = 50

[Infinium 450k] Maximal number of samples included in a single quality control barplot and negative control boxplot.

qc.coverage.plots = FALSE

[Bisulfite sequencing] Add genome-wide sequencing coverage plot for each sample.

qc.coverage.threshold.plot = 1:10

[Bisulfite sequencing] Values for coverage cutoffs to be shown in a coverage thresholds plot. This must be an integer vector of positive values. Setting this to an empty vector disables the coverage thresholds plot.

qc.coverage.histograms = FALSE

[Bisulfite sequencing] Add sequencing coverage histogram for each sample.

qc.coverage.violins = FALSE

[Bisulfite sequencing] Add sequencing coverage violin plot for each sample.

filtering.whitelist = NULL

Name of a file specifying site or probe identifiers to be whitelisted. Every line in this file must contain exactly one identifier. The whitelisted sites are always retained in the analysed datasets, even if filtering criteria or blacklisting requires their removal. For Infinium studies, the file must contain Infinium probe identifiers. For bisulfite sequencing studies, the file must contain CpG positions in the form "chromosome:coordinate" (1-based coordinate of the cytosine), e.g. chr2:48607772. Unknown identifiers are silently ignored.

filtering.blacklist = NULL

Name of a file specifying site or probe identifiers to be blacklisted. Every line in this file must contain exactly one identifier. The blacklisted sites are removed from the analysed datasets as a first step in the preprocessing module. For Infinium studies, the file must contain Infinium probe identifiers. For bisulfite sequencing studies, the file must contain CpG positions in the form "chromosome:coordinate" (1-based coordinate of the cytosine), e.g. chr2:48607772. Unknown identifiers are silently ignored.

filtering.context.removal = c("CC","CAG",...)

character vector giving the list of probe context types to be removed as a filtering step. Possible context values are "CC", "CG", "CAG", "CAH", "CTG", "CTH" and "Other". Probes in the second context measure CpG methylation; the last context denotes probes dedicated to SNP detection. Setting this option to NULL or an empty vector effectively disables the step of context-specific probe removal.

filtering.snp = "3"

Removal of sites or probes based on overlap with SNPs. The accepted values for this option are:

"no"

no SNP-based filtering;

"3"

filter out a probe when the last 3 bases in its target sequence overlap with SNP;

"5"

filter out a probe when the last 5 bases in its target sequence overlap with SNP;

"any" or "yes"

filter out a CpG site or probe when any base in its target sequence overlaps with SNP.

Bisulfite sequencing datasets operate on sites instead of probes, therefore, the values "3" and "5" are treated as "yes".

filtering.cross.reactive = FALSE

Flag indicating if the removal of potentially cross-reactive probes should be performed as a filtering step in the preprocessing module. A probes whose sequence maps to multiple genomic locations (allowing up to 3 mismatches) is cross-reactive.

filtering.greedycut = TRUE

Flag indicating if the Greedycut procedure should be run as a filtering step in the preprocessing module.

filtering.greedycut.pvalue.threshold = 0.05

Threshold for the detection p-value to be used in Greedycut. This is a value between 0 and 1. This option has effect only when filtering.greedycut is TRUE.

filtering.greedycut.rc.ties = "row"

Indicator of what the behaviour of Greedycut should be in case of ties between the scores of rows (probes) and columns (samples). The value of this option must be one of "row", "column" or "any"; the last one indicating random choice. This option has effect only when filtering.greedycut is TRUE.

filtering.sex.chromosomes.removal = FALSE

Flag indicating if the removal of probes located on sex chromosomes should be performed as a filtering step.

filtering.missing.value.quantile = 1

Number between 0 and 1, indicating the fraction of allowed missing values per site. A site is filtered out when its methylation beta values are NAs in a larger fraction of samples than this threshold. Setting this option to 1 (default) retains all sites, and thus effectively disables the missing value filtering step in the preprocessing module. If this is set to 0, all sites that contain missing values are filtered out.

filtering.coverage.threshold = 5

Threshold for minimal acceptable coverage. This must be a non-negative value. Setting this option to 0 (zero) effectively considers any known or unknown read coverage for sufficiently deep.

filtering.low.coverage.masking = FALSE

Flag indicating whether methylation values for low coverage sites should be set to missing. In combination with filtering.missing.value.quantile this can lead to the removal of sites.

filtering.high.coverage.outliers = FALSE

(Bisulfite sequencing mode) Flag indicating whether methylation sites with a coverage of more than 10 times the 95-percentile of coverage should be removed.

filtering.deviation.threshold = 0

Threshold used to filter probes based on the variability of their assigned beta values. This must be a real value between 0 and 1, denoting minimum standard deviation of the beta values in one site across all samples. Any sites that have standard deviation lower than this threshold are filtered out. Note that sites with undetermined varibility, that is, sites for which there are no measurements (all beta values are NAs), are retained. Setting this option to 0 (default) disables filtering based on methylation variability.

inference = FALSE

Flag indicating if the covariate inference analysis module is to be executed.

inference.targets.sva = character()

Column names in the sample annotation table for which surrogate variable analysis (SVA) should be conducted. An empty vector (default) means that SVA is skipped.

inference.reference.methylome.column = character()

Column name in the sample annotation table giving the assignment of samples to reference methylomes. The target samples should have NA values in this column.

inference.max.cell.type.markers = 50000

Number of most variable CpGs which are tested for association with the reference cell types. Setting this option to NULL forces the algorithm to use all available sites in the dataset, and may greatly increase the running time for cell type comoposition estimation.

inference.top.cell.type.markers = 500

Number of top cell type markers used for determining cell type contributions to the target DNA methylation profiles using the projection method of Houseman et al.

inference.sva.num.method = "leek"

Name of the method to be used for estimating the number of surrogate variables. must be either 'leek' or 'be', See sva function for details.

exploratory = TRUE

Flag indicating if the exploratory analysis module is to be executed.

exploratory.columns = NULL

Traits, given as column names or indices in the sample annotation table, to be used in the exploratory analysis. These traits are used in multiple steps in the module: they are visualized using point types and colors in the dimension reduction plots; tested for strong correlations and associations with principal components in a methylation space; used to define groups when plotting beta distributions and/or inter-sample methylation variability. The default value of this parameter - NULL - indicates that columns should be automatically selected; see rnb.sample.groups for how this is done.

exploratory.top.dimensions = 0

Number of most variable probes, sites or regions to select prior to performing dimension reduction techniques and tests for associations. Preselection can significantly reduce the running time and memory usage in the exploratory analysis module. Setting this number to zero (default) disables preselection.

exploratory.principal.components = 8

Maximum number of principal components to be tested for associations with other factors, such as control probe states and sample traits. This must be an integer value between 0 and 10. Setting this option to 0 disables such tests.

exploratory.correlation.pvalue.threshold = 0.01

Significance threshold for a p-value resulting from applying a test for association. This is a value between 0 and 1.

exploratory.correlation.permutations = 10000

Number of permutations in tests performed to check for associations between traits, and between control probe intensities and coordinates in the prinicipal component space. This must be a non-negative integer. Setting this option to 0 disables permutation tests.

exploratory.correlation.qc = TRUE

[Infinium 450k] Flag indicating if quality-associated batch effects should be studied. This amounts to testing for associations between intensities of quality control probes and principal components. This option has effect only when exploratory.principal.components is non-zero.

exploratory.beta.distribution = TRUE

Flag indicating whether beta value distributions for sample groups and probe or site categories should be computed.

exploratory.intersample = TRUE

Flag indicating if methylation variability in sample groups should be computed as part of the exploratory analysis module.

exploratory.deviation.plots = NULL

Flag indicating if the inter-sample methylation variability step in the exploratory analysis module should include deviation plots. Deviation plots show intra-group methylation variability at the covered sites and regions. Setting this option to NULL (default) enables deviation plots on Infinium datasets, but disables them in case of sequencing-based datasets, because their generation can be very computationally intensive. This option has effect only when exploratory.intersample is TRUE.

exploratory.clustering = "all"

Which sites should be used by clustering algorithms in the exploraroty analysis module. RnBeads performs several algorithms that cluster the samples in the dataset. If this option is set to "all" (default), clustering is performed using all sites; a value of "top" indicates that only the most variable sites are used (see the option exploratory.clustering.top.sites); and "none" disables clustering.

exploratory.clustering.top.sites = 1000

Number of most variable sites to use when visualizing heatmaps. This must be a non-empty integer vector containing positive values. This option is ignored when exploratory.clustering is "none".

exploratory.clustering.heatmaps.pdf = FALSE

Flag indicating if the generated methylation value heatmaps in the clustering section of the exploratory analysis module should be saved as PDF files. Enabling this option is not recommended for large values of exploratory.clustering.top.sites (more than 200), because heatmaps might generate very large PDF files.

exploratory.region.profiles = NULL

Region types for generating regional methylation profiles. If NULL (default), regional methylation profiles are created only for the region types that are available for the targeted assembly and summarized in the dataset of interest. Setting this option to an empty vector disables the region profiles step in the exploratory analysis module.

exploratory.gene.symbols = NULL

A list of gene symbols to be used for custom locus profiling. Locus views will be generated for these genes.

exploratory.custom.loci.bed = NULL

Path to a bed file containing custom genomic regions. Locus views will be generated for these regions.

differential = TRUE

Flag indicating if the differential methylation module is to be executed.

differential.site.test.method = "limma"

Method to be used for calculating p-values on the site level. Currently supported options are "ttest" for a (paired) t-test and "limma" for a linear modeling approach implemented in the limma package for differential expression in microarrays.

differential.permutations = 0

Number of permutation tests performed to compute the p-value of rank permutation tests in the differential methylation analysis. This must be a non-negative integer. Setting this option to 0 (default) disables permutation tests for rank permutations. Note that p-values for differential methylation are computed and also considered for the ranking in any case.

differential.comparison.columns = NULL

Column names or indices in the table of the sample annotation table to be used for group definition in the differential methylation analysis. The default value - NULL - indicates that columns should be automatically selected. Seernb.sample.groups for how this is done. By default, the comparisons are done in a one vs. all manner if there are multiple groups defined in a column.

differential.comparison.columns.all.pairwise = NULL

Column names or indices in the table of sample annotation table to be used for group definition in the differential methylation analysis in which all pairwise comparisons between groups should be conducted (the default is one vs all if multiple groups are specified in a column). Caution: for large numbers of sample groups this can lead to combinatorial explosion and thus to huge runtimes. A value of NULL (default) indicates that no column is selected for all pairwise comparisons explicitely. If specified, the selected columns must be a subset of the columns that will be selected according to the differential.comparison.columns option.

covariate.adjustment.columns = NULL

Column names or indices in the table of phenotypic information to be used for confounder adjustment in the differential methylation analysis. Currently this is only supported for differential.site.test.method=="limma".

columns.pairing = NULL

A NAMED vector containing for each column name for which paired analysis should be performed (say columnA) the name or index of another column (say columnB) in which same values indicate the same pairing. columnA should be the name of the value columnB in this vector. For more details see rnb.sample.groups

differential.adjustment.sva = TRUE

Flag indicating if the differential methylation analysis should account for Surrogate Variables. If TRUE, RnBeads looks for overlaps between the differential.comparison.columns and inference.targets.sva options and include the surrogate variables as confounding factors only for these columns. In other words, it will only have an effect if the corresponding inference option (see inference.targets.sva option for details) is enabled. Currently this is only supported for differential.site.test.method=="limma".

differential.adjustment.celltype = TRUE

Should the differential methylation analysis account for celltype using the reference based Houseman method. It will only have an effect if the corresponding inference option is enabled (see inference.reference.methylome.column option for details). Currently this is only supported for differential.site.test.method=="limma".

differential.enrichment = FALSE

Flag indicating whether Gene Ontology (GO)-enrichment analysis is to be conducted on the identified differentially methylated regions.

differential.report.sites = TRUE

Flag indicating whether a section corresponding to differential site methylation should be added to the report. Has no effect on the actual analysis, just the report. To disable differential site methylation analysis entirely use the analyze.sites option.

export.to.bed = TRUE

Flag indicating whether the data should be exported to bed files.

export.to.trackhub = c("bigBed","bigWig")

character vector specifying which data types should be exported to Track hub directories. Possible values in the vector are "bigBed" and "bigWig". When this options is set to NULL, track hub export is disabled. Note that if "bigBed" is contained in this option, bed files are created automatically.

export.to.csv = FALSE

Flag indicating whether methylation value matrices are to be exported to comma-separated value (CSV) files.

export.to.ewasher = FALSE

Flag indicating whether methylation values and differential methylation analysis settings should be exported to a format compatible with FaST-LMM-EWASher, a tool for adjusting for cell-type compositions. See Zou, J., et al., Nature Methods, 2014 for further details on the tool.

export.types = "sites"

character vector of sites and region names to be exported. If NULL, no region methylation values are exported.

disk.dump.big.matrices = FALSE

Flag indicating whether big tables should be stored on disk rather than in main memory in order to keep memory requirements down. May slow down analysis!

logging.exit.on.error = FALSE

Flag indicating if the active R session should be terminated when an error is encountered during execution.

distribution.subsample = 1000000

When plotting methylation value distributions, this threshold specifies the number of observations drawn per group. Distributions are estimated and plotted based on these random subsamples. This approach can significantly reduce the memory requirements of the preprocessing and exploratory analysis modules, where methylation value distributions are plotted. Setting this to 0 disables subsampling. More information is presented the Details section of rnb.step.betadistribution

.

enforce.memory.management = FALSE

Flag indicating whether in some places of the code memory management should actively being enforced in order to achieve a better memory profile. I.e. garbage collection, variable removal is conducted actively. May slow down analysis.

enforce.destroy.disk.dumps = FALSE

Flag indicating whether disked dumped big matrices (see disk.dump.big.matrices option) should actively be deleted when RnBSets are modified. You should switch it to TRUE when disk.dump.big.matrices is TRUE and the amount of hard drive space is also limited.

Author(s)

Yassen Assenov

Examples

1
2
str(rnb.options())
rnb.getOption("filtering.greedycut")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.