prepare.data.BS: prepare.data.BS

View source: R/data_preparation.R

prepare.data.BSR Documentation

prepare.data.BS

Description

This functions prepares sequencing data sets for a MeDeCom run.

Usage

prepare.data.BS(
  rnb.set,
  work.dir = getwd(),
  analysis.name = "analysis",
  sample.selection.col = NA,
  sample.selection.grep = NA,
  ref.ct.column = NA,
  pheno.columns = NA,
  prepare.true.proportions = FALSE,
  true.A.token = NA,
  houseman.A.token = NA,
  id.column = rnb.getOption("identifiers.column"),
  filter.coverage = hasCovg(rnb.set),
  min.coverage = 5,
  min.covg.quant = 0.05,
  max.covg.quant = 0.95,
  filter.na = TRUE,
  filter.snp = TRUE,
  snp.list = NULL,
  filter.sex.chromosomes = TRUE,
  execute.lump = FALSE
)

Arguments

rnb.set

An object of type RnBiseqSet-class for which analysis is to be performed.

work.dir

A path to a existing directory, in which the results are to be stored

analysis.name

A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.

sample.selection.col

A column name in the phenotypic table of rnb.set used to selected a subset of samples for analysis that contain the string given in sample.selection.col.

sample.selection.grep

A string used for selecting samples in the column sample.selection.grep.

ref.ct.column

Column name in rnb.set used to extract methylation information on the reference cell types.

pheno.columns

Vector of column names in the phenotypic table of rnb.set that is kept and exported for further exploration.

prepare.true.proportions

Flag indicating if true proportions are either available in rnb.set or to be estimated with Houseman's reference-based deconvolution approach.

true.A.token

String present in the column names of rnb.set used for selecting the true proportions of the corresponding cell types.

houseman.A.token

Similar to true.A.token, but not containing the true proportions, rather the estimated proportions by Houseman's method.

id.column

Sample-specific ID column name in rnb.set

filter.coverage

Flag indicating, if site-filtering based on coverage is to be conducted.

min.coverage

Minimum number of reads required in each sample for the site to be considered for adding to MeDeCom.

min.covg.quant

Lower quantile of coverages. Values lower than this value will be ignored for analysis.

max.covg.quant

Upper quantile of coverages. Values higher than this value will be ignored for analysis.

filter.na

Flag indicating if sites with any missing values are to be removed or not.

filter.snp

Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list.

snp.list

Path to a file containing positions of known SNPs to be removed from the analysis, if filter.snp is TRUE. The coordinates must be the provided in the same genome assembly as rnb.set. The file must be a tab-separated value (tsv) file with only one header line an the following meaning of the rows: 1st row: chromosome, 2nd row: position of the SNP on the chromosome

filter.sex.chromosomes

Flag indicating if only somatic probes are to be kept.

execute.lump

Flag indicating if the LUMP algorithm is to be used for estimating the amount of immune cells in a particular sample.

Value

A list with four elements:

  • quality.filter The indices of the sites that survived quality filtering

Author(s)

Michael Scherer, Pavlo Lutsik


CompEpigen/DecompPipeline documentation built on Nov. 3, 2023, 5:35 p.m.