prepare.data.BS: prepare.data.BS
In CompEpigen/DecompPipeline: Preparation pipeline for MeDeCom

View source: R/data_preparation.R

prepare.data.BS

R Documentation

prepare.data.BS

Description

This functions prepares sequencing data sets for a MeDeCom run.

Usage

prepare.data.BS(
  rnb.set,
  work.dir = getwd(),
  analysis.name = "analysis",
  sample.selection.col = NA,
  sample.selection.grep = NA,
  ref.ct.column = NA,
  pheno.columns = NA,
  prepare.true.proportions = FALSE,
  true.A.token = NA,
  houseman.A.token = NA,
  id.column = rnb.getOption("identifiers.column"),
  filter.coverage = hasCovg(rnb.set),
  min.coverage = 5,
  min.covg.quant = 0.05,
  max.covg.quant = 0.95,
  filter.na = TRUE,
  filter.snp = TRUE,
  snp.list = NULL,
  filter.sex.chromosomes = TRUE,
  execute.lump = FALSE
)

Arguments

`rnb.set`	An object of type `RnBiseqSet-class` for which analysis is to be performed.
`work.dir`	A path to a existing directory, in which the results are to be stored
`analysis.name`	A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.
`sample.selection.col`	A column name in the phenotypic table of `rnb.set` used to selected a subset of samples for analysis that contain the string given in `sample.selection.col`.
`sample.selection.grep`	A string used for selecting samples in the column `sample.selection.grep`.
`ref.ct.column`	Column name in `rnb.set` used to extract methylation information on the reference cell types.
`pheno.columns`	Vector of column names in the phenotypic table of `rnb.set` that is kept and exported for further exploration.
`prepare.true.proportions`	Flag indicating if true proportions are either available in `rnb.set` or to be estimated with Houseman's reference-based deconvolution approach.
`true.A.token`	String present in the column names of `rnb.set` used for selecting the true proportions of the corresponding cell types.
`houseman.A.token`	Similar to `true.A.token`, but not containing the true proportions, rather the estimated proportions by Houseman's method.
`id.column`	Sample-specific ID column name in `rnb.set`
`filter.coverage`	Flag indicating, if site-filtering based on coverage is to be conducted.
`min.coverage`	Minimum number of reads required in each sample for the site to be considered for adding to MeDeCom.
`min.covg.quant`	Lower quantile of coverages. Values lower than this value will be ignored for analysis.
`max.covg.quant`	Upper quantile of coverages. Values higher than this value will be ignored for analysis.
`filter.na`	Flag indicating if sites with any missing values are to be removed or not.
`filter.snp`	Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list.
`snp.list`	Path to a file containing positions of known SNPs to be removed from the analysis, if `filter.snp` is `TRUE`. The coordinates must be the provided in the same genome assembly as rnb.set. The file must be a tab-separated value (tsv) file with only one header line an the following meaning of the rows: 1st row: chromosome, 2nd row: position of the SNP on the chromosome
`filter.sex.chromosomes`	Flag indicating if only somatic probes are to be kept.
`execute.lump`	Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.