prepare.data: FUNCTIONS prepare.data

View source: R/data_preparation.R

prepare.dataR Documentation

FUNCTIONS prepare.data

Description

This functions prepares Illumina BeadChip data for a MeDeCom/EDec/RefFreeCellMix run.

Usage

prepare.data(
  rnb.set,
  work.dir = getwd(),
  analysis.name = "analysis",
  sample.selection.col = NA,
  sample.selection.grep = NA,
  pheno.columns = NA,
  id.column = rnb.getOption("identifiers.column"),
  normalization = "none",
  ref.ct.column = NA,
  ref.rnb.set = NULL,
  ref.rnb.ct.column = NA,
  prepare.true.proportions = FALSE,
  true.A.token = NA,
  houseman.A.token = NA,
  estimate.houseman.prop = FALSE,
  filter.beads = !is.null(rnb.set@covg.sites),
  min.n.beads = 3,
  filter.intensity = inherits(rnb.set, "RnBeadRawSet"),
  min.int.quant = 0.001,
  max.int.quant = 0.999,
  filter.na = TRUE,
  filter.context = TRUE,
  filter.snp = TRUE,
  filter.sex.chromosomes = TRUE,
  filter.cross.reactive = T,
  remove.ICA = F,
  conf.fact.ICA = NULL,
  ica.setting = NULL,
  snp.list = NULL,
  execute.lump = FALSE,
  dist.snps = FALSE
)

Arguments

rnb.set

An object of type RnBSet-class for which analysis is to be performed.

work.dir

A path to a existing directory, in which the results are to be stored

analysis.name

A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.

sample.selection.col

A column name in the phenotypic table of rnb.set used to selected a subset of samples for analysis that contain the string given in sample.selection.col.

sample.selection.grep

A string used for selecting samples in the column sample.selection.grep.

pheno.columns

Vector of column names in the phenotypic table of rnb.set that is kept and exported for further exploration.

id.column

Sample-specific ID column name in rnb.set

normalization

Normalization method to be performed before employing MeDeCom. Can be one of "none", "dasen", "illumina", "noob", "bmiq".

ref.ct.column

Column name in rnb.set used to extract methylation information on the reference cell types.

ref.rnb.set

An object of type RnBSet-class containing methylation information on reference cell types.

ref.rnb.ct.column

Column name in ref.rnb.set used to extract methylation information on the reference cell types.

prepare.true.proportions

Flag indicating if true proportions are either available in rnb.set or to be estimated with Houseman's reference-based deconvolution approach.

true.A.token

String present in the column names of rnb.set used for selecting the true proportions of the corresponding cell types.

houseman.A.token

Similar to true.A.token, but not containing the true proportions, rather the estimated proportions by Houseman's method.

estimate.houseman.prop

If neither true.A.token nor houseman.A.token are given, the proportions of the reference cell type are estimated with Houseman's approach.

filter.beads

Flag indicating, if site-filtering based on the number of beads available is to be conducted.

min.n.beads

Minimum number of beads required in each sample for the site to be considered for adding to MeDeCom.

filter.intensity

Flag indicating if sites should be removed according to the signal intensities (the lowest and highest quantiles given by min.int.quant and max.int.quant). Note that all sites are removed that have a value outside of the provided quantile range in either of the channels and in any of the samples.

min.int.quant

Lower quantile of intensities which is to be removed.

max.int.quant

Upper quantile of intensities which is to be removed.

filter.na

Flag indicating if sites with any missing values are to be removed or not.

filter.context

Flag indicating if only CG probes are to be kept.

filter.snp

Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list. Or as the sites specified in snp.list.

filter.sex.chromosomes

Flag indicating if only somatic probes are to be kept.

filter.cross.reactive

Flag indicating if sites showing cross reactivity on the array are to be removed.

remove.ICA

Flag indicating if independent component analysis is to be executed to remove potential confounding factor. If TRUE,conf.fact.ICA needs to be specified.

conf.fact.ICA

A vector of column names in the sample annotation sheet representing potential confounding factors.

ica.setting

Named vector of settings passed to run.rnb.ica. Options are nmin, nmax, thres.sd, alpha.fact, save.report, alpha.feat, type, ncores. See run.rnb.ICA for further details. NULL indicates the default setting.

snp.list

Path to a file containing CpG IDs of known SNPs to be removed from the analysis, if filter.snp is TRUE.

execute.lump

Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.

dist.snps

Flag indicating if SNPs are to removed by determining if the pairwise differences between the CpGs in the samples are trimodally distributed as it is frequently found around SNPs.

Value

A list with four elements:

  • quality.filter The indices of the sites that survived quality filtering

Author(s)

Michael Scherer, Pavlo Lutsik


CompEpigen/DecompPipeline documentation built on Nov. 3, 2023, 5:35 p.m.