start_decomp_pipeline: start_decomp_pipeline

Description Usage Arguments Value Author(s)

View source: R/start_analysis.R

Description

Main workhorse of the DecompPipeline R-package. Performs preprocessing (prepare_data or prepare_data_BS), CpG subset selection (prepare_CG_subsets) and deconvolution (start_medecom_analysis, start.refreeewas.analysis, start.edec.analysis)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
start_decomp_pipeline(rnb.set, Ks, lambda.grid, work.dir = getwd(),
  factorviz.outputs = F, analysis.name = "Analysis",
  sample.selection.col = NA, sample.selection.grep = NA,
  pheno.cols = NA, id.column = rnb.getOption("identifiers.column"),
  normalization = "none", ref.ct.column = NA, ref.rnb.set = NULL,
  ref.rnb.ct.column = NA, prepare.true.proportions = F,
  true.A.token = NA, houseman.A.token = NA,
  estimate.houseman.prop = F,
  filter.beads = !is.null(rnb.set@covg.sites), min.n.beads = 3,
  filter.intensity = inherits(rnb.set, "RnBeadRawSet"),
  min.int.quant = 0.01, max.int.quant = 0.99, filter.na = TRUE,
  filter.context = TRUE, filter.cross.reactive = TRUE,
  execute.lump = FALSE, remove.ICA = FALSE, conf.fact.ICA = FALSE,
  ica.setting = NULL, filter.snp = TRUE, filter.somatic = TRUE,
  snp.list = NULL, filter.coverage = hasCovg(rnb.set),
  min.coverage = 5, min.covg.quant = 0.05, max.covg.quant = 0.95,
  marker.selection = "var", n.markers = 5000,
  remove.correlated = FALSE, cor.threshold = "quantile",
  write.files = FALSE, n.prin.comp = 10, range.diff = 0.05,
  custom.marker.file = "", store.heatmaps = F,
  heatmap.sample.col = NULL, sample.subset = NULL, k.fixed = NULL,
  K.prior = NULL, opt.method = "MeDeCom.cppTAfact", startT = NULL,
  startA = NULL, folds = 10, cores = 1, itermax = 1000,
  ninit = 100, cluster.submit = FALSE, cluster.Rdir = NA,
  cluster.hostlist = "*", cluster.memlimit = "5G", cleanup = FALSE)

Arguments

rnb.set

An object of type RnBSet-class for which analysis is to be performed.

Ks

Vector of integers used as components in MeDeCom.

lambda.grid

Vector of doubles representing the regularization parameter in MeDeCom.

work.dir

A path to a existing directory, in which the results are to be stored

factorviz.outputs

Flag indicating, if outputs should be stored to be compatible with FactorViz for data exploration

analysis.name

A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.

sample.selection.col

A column name in the phenotypic table of RNB_SET used to selected a subset of samples for analysis that contain the string given in SAMPLE_SELECTION_GREP.

sample.selection.grep

A string used for selecting samples in the column SAMPLE_SELECTION_COL.

pheno.cols

Vector of column names in the phenotypic table of RNB_SET that is kept and exported for further exploration.

id.column

Sample-specific ID column name in RNB_SET

normalization

Normalization method to be performed before employing MeDeCom. Can be one of "none","dasen","illumina","noob" (BeadChip only).

ref.ct.column

Column name in RNB_SET used to extract methylation information on the reference cell types.

ref.rnb.set

An object of type RnBSet-class containing methylation information on reference cell types (BeadChip only).

ref.rnb.ct.column

Column name in REF_RNB_SET used to extract methylation information on the reference cell types (BeadChip only).

prepare.true.proportions

Flag indicating if true proportions are either available in RNB_SET or to be estimated with Houseman's reference-based deconvolution approach (BeadChip only).

true.A.token

String present in the column names of RNB_SET used for selecting the true proportions of the corresponding cell types.

houseman.A.token

Similar to TRUE_A_TOKEN, but not containing the true proportions, rather the estimated proportions by Houseman's method (BeadChip only).

estimate.houseman.prop

If neither TRUE_A_TOKEN nor HOUSEMAN_A_TOKEN are given, the proportions of the reference cell type are estimated with Houseman's approach (BeadChip only).

filter.beads

Flag indicating, if site-filtering based on the number of beads available is to be conducted (BeadChip only).

min.n.beads

Minimum number of beads required in each sample for the site to be considered for adding to MeDeCom (BeadChip only).

filter.intensity

Flag indicating if sites should be removed according to the signal intensities (the lowest and highest quantiles given by MIN_INT_QUANT and MAX_INT_QUANT) (BeadChip only).

min.int.quant

Lower quantile of intensities which is to be removed (BeadChip only).

max.int.quant

Upper quantile of intensities which is to be removed (BeadChip only).

filter.na

Flag indicating if sites with any missing values are to be removed or not.

filter.context

Flag indicating if only CG probes are to be kept (BeadChip only).

filter.cross.reactive

Flag indicating if sites showing cross reactivity on the array are to be removed.

execute.lump

Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.

remove.ICA

Flag indicating if independent component analysis is to be executed to remove potential confounding factor. If TRUE,conf.fact.ICA needs to be specified.

conf.fact.ICA

Column name in the sample annotation sheet representing a potential confounding factor.

ica.setting

Optional argument setting up ICA.

filter.snp

Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list. (@TODO: we could provide an addititional list of SNPs, similar to RnBeads blacklist for filtering)

filter.somatic

Flag indicating if only somatic probes are to be kept. CPG FILTERING (BS)

snp.list

Path to a file containing CpG IDs of known SNPs to be removed from the analysis, if FILTER_SNP is TRUE.

filter.coverage

Flag indicating, if site-filtering based on coverage is to be conducted (BS only).

min.coverage

Minimum number of reads required in each sample for the site to be considered for adding to MeDeCom (BS only).

min.covg.quant

Lower quantile of coverages. Values lower than this value will be ignored for analysis (BS only).

max.covg.quant

Upper quantile of coverages. Values higher than this value will be ignored for analysis (BS only). CG_SUBSET SELECTION

marker.selection

A vector of strings representing marker selection methods. Available method are

  • "all" Using all sites available in the input.

  • "pheno" Selected are the top N_MARKERS site that differ between the phenotypic groups defined in data preparation or by rnb.sample.groups. Those are selected by employing limma on the methylation matrix.

  • "houseman2012" The 50k sites reported as cell-type specific in the Houseman's reference- based deconvolution. See Houseman et.al. 2012.

  • "houseman2014" Selects the sites said to be linked to cell type composition by RefFreeEWAS, which is similar to surrogate variable analysis. See Houseman et.al. 2014.

  • "jaffe2014" The sites stated as related to cell-type composition Jaffe et.al. 2014.

  • "rowFstat" Markers are selected as those found to be associated to the reference cell types with F-statistics. If this option is selected, REF_DATA_SET and REF_PHENO_COLUMN need to be specified.

  • "random" Sites are randomly selected.

  • "pca" Sites are selected as those with most influence on the principal components.

  • "var" Selects the most variable sites.

  • "hybrid" Selects (N_MARKERS/2) most variable and (N_MARKERS/2) random sites.

  • "range" Selects the sites with the largest difference between minimum and maximum across samples.

  • "pcadapt" Uses principal component analysis as implemented in the "bigstats" R package to determine sites that are significantly linked to the potential cell types. This requires specifying K a priori (argument K.prior). We thank Florian Prive and Sophie Achard for providing the idea and parts of the codes.

  • "edec_stage0 Employs EDec's stage 0 to infer cell-type specific markers. By default EDec's example reference data is provided. If a specific data set is to be provided, it needs to be done through REF_DATA_SET.

  • "custom" Specifying a custom file with indices.

n.markers

The number of sites to be selected. Defaults to 5000.

remove.correlated

Flag indicating if highly correlated features are to be removed.

cor.threshold

Numeric indicating a threshold above which sites are not to be considered in the feature selection. If "quantile", sites correlated higher than the 95th quantile are removed.

write.files

Flag indicating if the selected sites are to be stored on disk.

n.prin.comp

Optional argument deteriming the number of prinicipal components used for selecting the most important sites.

range.diff

Optional argument specifying the difference between maximum and minimum required.

custom.marker.file

Optional argument containing a file that specifies the indices used for employing MeDeCom.

store.heatmaps

Flag indicating if a heatmap of the selected input sites is to be create from the input methylation matrix. The files are then stored in the 'heatmaps' folder in WD.

heatmap.sample.col

Column name in the phenotypic table of rnb.set, used for creating a color scheme in the heatmap.

sample.subset

Vector of indices of samples to be included in the analysis. If NULL, all samples are included.

k.fixed

Columns in the T matrix that should be fixed. If NULL, no columns are fixed.

K.prior

K determined from visual inspection. Only has an influence, if MARKER_SELECTION="pcadapt".

opt.method

Optimization method to be used. Either MeDeCom.quadPen or MeDeCom.cppTAfact (default).

startT

Inital matrix for T.

startA

Initial matrix for A.

folds

Integer representing the number of folds used in the analysis.

cores

Integer representing the number of cores to be used in the analysis.

itermax

Maximum number of iterations

ninit

Number if initialtions.

cluster.submit

Flag indicating, if the jobs are to be submitted to a scientific compute cluster (only SGE supported).

cluster.Rdir

Path to an executable version of R.

cluster.hostlist

Regular expression, on which basis hosts are selected in the cluster environment.

cluster.memlimit

the memlimit resource value of the cluster submission.

cleanup

Flag indicating if temprary files are to be deleted.

Value

An object of type MeDeComSet containing the results of the MeDeCom experiment.

Author(s)

Michael Scherer


lutsik/DecompPipeline documentation built on Oct. 13, 2019, 1:51 a.m.