prepare_data: GLOBALS FUNCTIONS prepare_data

Description Usage Arguments Value

View source: R/data_preparation.R

Description

This functions prepares Illumina BeadChip data for a MeDeCom run.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
prepare_data(RNB_SET, WORK_DIR = getwd(), analysis.name = "analysis",
  SAMPLE_SELECTION_COL = NA, SAMPLE_SELECTION_GREP = NA,
  PHENO_COLUMNS = NA, ID_COLUMN = rnb.getOption("identifiers.column"),
  NORMALIZATION = "none", REF_CT_COLUMN = NA, REF_RNB_SET = NULL,
  REF_RNB_CT_COLUMN = NA, PREPARE_TRUE_PROPORTIONS = FALSE,
  TRUE_A_TOKEN = NA, HOUSEMAN_A_TOKEN = NA,
  ESTIMATE_HOUSEMAN_PROP = FALSE,
  FILTER_BEADS = !is.null(RNB_SET@covg.sites), MIN_N_BEADS = 3,
  FILTER_INTENSITY = inherits(RNB_SET, "RnBeadRawSet"),
  MIN_INT_QUANT = 0.01, MAX_INT_QUANT = 0.99, FILTER_NA = TRUE,
  FILTER_CONTEXT = TRUE, FILTER_SNP = TRUE, FILTER_SOMATIC = TRUE,
  FILTER_CROSS_REACTIVE = T, remove.ICA = F, conf.fact.ICA = NULL,
  ica.setting = NULL, snp.list = NULL, execute.lump = FALSE,
  dist.snps = FALSE)

Arguments

RNB_SET

An object of type RnBSet-class for which analysis is to be performed.

WORK_DIR

A path to a existing directory, in which the results are to be stored

analysis.name

A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.

SAMPLE_SELECTION_COL

A column name in the phenotypic table of RNB_SET used to selected a subset of samples for analysis that contain the string given in SAMPLE_SELECTION_GREP.

SAMPLE_SELECTION_GREP

A string used for selecting samples in the column SAMPLE_SELECTION_COL.

PHENO_COLUMNS

Vector of column names in the phenotypic table of RNB_SET that is kept and exported for further exploration.

ID_COLUMN

Sample-specific ID column name in RNB_SET

NORMALIZATION

Normalization method to be performed before employing MeDeCom. Can be one of "none","dasen","illumina","noob".

REF_CT_COLUMN

Column name in RNB_SET used to extract methylation information on the reference cell types.

REF_RNB_SET

An object of type RnBSet-class containing methylation information on reference cell types.

REF_RNB_CT_COLUMN

Column name in REF_RNB_SET used to extract methylation information on the reference cell types.

PREPARE_TRUE_PROPORTIONS

Flag indicating if true proportions are either available in RNB_SET or to be estimated with Houseman's reference-based deconvolution approach.

TRUE_A_TOKEN

String present in the column names of RNB_SET used for selecting the true proportions of the corresponding cell types.

HOUSEMAN_A_TOKEN

Similar to TRUE_A_TOKEN, but not containing the true proportions, rather the estimated proportions by Houseman's method.

ESTIMATE_HOUSEMAN_PROP

If neither TRUE_A_TOKEN nor HOUSEMAN_A_TOKEN are given, the proportions of the reference cell type are estimated with Houseman's approach.

FILTER_BEADS

Flag indicating, if site-filtering based on the number of beads available is to be conducted.

MIN_N_BEADS

Minimum number of beads required in each sample for the site to be considered for adding to MeDeCom.

FILTER_INTENSITY

Flag indicating if sites should be removed according to the signal intensities (the lowest and highest quantiles given by MIN_INT_QUANT and MAX_INT_QUANT). Note that all sites are removed that have a value outside of the provided quantile range in either of the channels and in any of the samples.

MIN_INT_QUANT

Lower quantile of intensities which is to be removed.

MAX_INT_QUANT

Upper quantile of intensities which is to be removed.

FILTER_NA

Flag indicating if sites with any missing values are to be removed or not.

FILTER_CONTEXT

Flag indicating if only CG probes are to be kept.

FILTER_SNP

Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list. Or as the sites specified in snp.list.

FILTER_SOMATIC

Flag indicating if only somatic probes are to be kept.

FILTER_CROSS_REACTIVE

Flag indicating if sites showing cross reactivity on the array are to be removed.

remove.ICA

Flag indicating if independent component analysis is to be executed to remove potential confounding factor. If TRUE,conf.fact.ICA needs to be specified.

conf.fact.ICA

A vector of column names in the sample annotation sheet representing potential confounding factors.

ica.setting

Named vector of settings passed to run.rnb.ica. Options are nmin, nmax, thres.sd, alpha.fact, save.report, alpha.feat, type, ncores. See run.rnb.ica for further details. NULL indicates the default setting.

snp.list

Path to a file containing CpG IDs of known SNPs to be removed from the analysis, if FILTER_SNP is TRUE.

execute.lump

Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.

dist.snps

Flag indicating if SNPs are to removed by determining if the pairwise differences between the CpGs in the samples are trimodally distributed as it is frequently found around SNPs.

Value

A list with four elements:


lutsik/DecompPipeline documentation built on Oct. 13, 2019, 1:51 a.m.