prepare.data: FUNCTIONS prepare.data
In CompEpigen/DecompPipeline: Preparation pipeline for MeDeCom

View source: R/data_preparation.R

prepare.data

R Documentation

FUNCTIONS prepare.data

Description

This functions prepares Illumina BeadChip data for a MeDeCom/EDec/RefFreeCellMix run.

Usage

prepare.data(
  rnb.set,
  work.dir = getwd(),
  analysis.name = "analysis",
  sample.selection.col = NA,
  sample.selection.grep = NA,
  pheno.columns = NA,
  id.column = rnb.getOption("identifiers.column"),
  normalization = "none",
  ref.ct.column = NA,
  ref.rnb.set = NULL,
  ref.rnb.ct.column = NA,
  prepare.true.proportions = FALSE,
  true.A.token = NA,
  houseman.A.token = NA,
  estimate.houseman.prop = FALSE,
  filter.beads = !is.null(rnb.set@covg.sites),
  min.n.beads = 3,
  filter.intensity = inherits(rnb.set, "RnBeadRawSet"),
  min.int.quant = 0.001,
  max.int.quant = 0.999,
  filter.na = TRUE,
  filter.context = TRUE,
  filter.snp = TRUE,
  filter.sex.chromosomes = TRUE,
  filter.cross.reactive = T,
  remove.ICA = F,
  conf.fact.ICA = NULL,
  ica.setting = NULL,
  snp.list = NULL,
  execute.lump = FALSE,
  dist.snps = FALSE
)

Arguments

`rnb.set`	An object of type `RnBSet-class` for which analysis is to be performed.
`work.dir`	A path to a existing directory, in which the results are to be stored
`analysis.name`	A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.
`sample.selection.col`	A column name in the phenotypic table of `rnb.set` used to selected a subset of samples for analysis that contain the string given in `sample.selection.col`.
`sample.selection.grep`	A string used for selecting samples in the column `sample.selection.grep`.
`pheno.columns`	Vector of column names in the phenotypic table of `rnb.set` that is kept and exported for further exploration.
`id.column`	Sample-specific ID column name in `rnb.set`
`normalization`	Normalization method to be performed before employing MeDeCom. Can be one of `"none", "dasen", "illumina", "noob", "bmiq"`.
`ref.ct.column`	Column name in `rnb.set` used to extract methylation information on the reference cell types.
`ref.rnb.set`	An object of type `RnBSet-class` containing methylation information on reference cell types.
`ref.rnb.ct.column`	Column name in `ref.rnb.set` used to extract methylation information on the reference cell types.
`prepare.true.proportions`	Flag indicating if true proportions are either available in `rnb.set` or to be estimated with Houseman's reference-based deconvolution approach.
`true.A.token`	String present in the column names of `rnb.set` used for selecting the true proportions of the corresponding cell types.
`houseman.A.token`	Similar to `true.A.token`, but not containing the true proportions, rather the estimated proportions by Houseman's method.
`estimate.houseman.prop`	If neither `true.A.token` nor `houseman.A.token` are given, the proportions of the reference cell type are estimated with Houseman's approach.
`filter.beads`	Flag indicating, if site-filtering based on the number of beads available is to be conducted.
`min.n.beads`	Minimum number of beads required in each sample for the site to be considered for adding to MeDeCom.
`filter.intensity`	Flag indicating if sites should be removed according to the signal intensities (the lowest and highest quantiles given by `min.int.quant` and `max.int.quant`). Note that all sites are removed that have a value outside of the provided quantile range in either of the channels and in any of the samples.
`min.int.quant`	Lower quantile of intensities which is to be removed.
`max.int.quant`	Upper quantile of intensities which is to be removed.
`filter.na`	Flag indicating if sites with any missing values are to be removed or not.
`filter.context`	Flag indicating if only CG probes are to be kept.
`filter.snp`	Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list. Or as the sites specified in `snp.list`.
`filter.sex.chromosomes`	Flag indicating if only somatic probes are to be kept.
`filter.cross.reactive`	Flag indicating if sites showing cross reactivity on the array are to be removed.
`remove.ICA`	Flag indicating if independent component analysis is to be executed to remove potential confounding factor. If `TRUE`,conf.fact.ICA needs to be specified.
`conf.fact.ICA`	A vector of column names in the sample annotation sheet representing potential confounding factors.
`ica.setting`	Named vector of settings passed to run.rnb.ica. Options are `nmin, nmax, thres.sd, alpha.fact, save.report, alpha.feat, type, ncores`. See `run.rnb.ICA` for further details. NULL indicates the default setting.
`snp.list`	Path to a file containing CpG IDs of known SNPs to be removed from the analysis, if `filter.snp` is `TRUE`.
`execute.lump`	Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.
`dist.snps`	Flag indicating if SNPs are to removed by determining if the pairwise differences between the CpGs in the samples are trimodally distributed as it is frequently found around SNPs.