prepare_data_BS: prepare_data_BS

Description Usage Arguments Value

View source: R/data_preparation.R

Description

This functions prepares sequencing data sets for a MeDeCom run.

Usage

1
2
3
4
5
6
7
8
9
prepare_data_BS(RNB_SET, WORK_DIR = getwd(),
  analysis.name = "analysis", SAMPLE_SELECTION_COL = NA,
  SAMPLE_SELECTION_GREP = NA, REF_CT_COLUMN = NA, PHENO_COLUMNS = NA,
  PREPARE_TRUE_PROPORTIONS = FALSE, TRUE_A_TOKEN = NA,
  HOUSEMAN_A_TOKEN = NA,
  ID_COLUMN = rnb.getOption("identifiers.column"),
  FILTER_COVERAGE = hasCovg(RNB_SET), MIN_COVERAGE = 5,
  MIN_COVG_QUANT = 0.05, MAX_COVG_QUANT = 0.95, FILTER_NA = TRUE,
  FILTER_SNP = TRUE, snp.list = NULL, FILTER_SOMATIC = TRUE)

Arguments

RNB_SET

An object of type RnBiseqSet-class for which analysis is to be performed.

WORK_DIR

A path to a existing directory, in which the results are to be stored

analysis.name

A string representing the dataset for which analysis is to be performed. Only used to create a folder with a descriptive name of the analysis.

SAMPLE_SELECTION_COL

A column name in the phenotypic table of RNB_SET used to selected a subset of samples for analysis that contain the string given in SAMPLE_SELECTION_GREP.

SAMPLE_SELECTION_GREP

A string used for selecting samples in the column SAMPLE_SELECTION_COL.

REF_CT_COLUMN

Column name in RNB_SET used to extract methylation information on the reference cell types.

PHENO_COLUMNS

Vector of column names in the phenotypic table of RNB_SET that is kept and exported for further exploration.

PREPARE_TRUE_PROPORTIONS

Flag indicating if true proportions are either available in RNB_SET or to be estimated with Houseman's reference-based deconvolution approach.

TRUE_A_TOKEN

String present in the column names of RNB_SET used for selecting the true proportions of the corresponding cell types.

HOUSEMAN_A_TOKEN

Similar to TRUE_A_TOKEN, but not containing the true proportions, rather the estimated proportions by Houseman's method.

ID_COLUMN

Sample-specific ID column name in RNB_SET

FILTER_COVERAGE

Flag indicating, if site-filtering based on coverage is to be conducted.

MIN_COVERAGE

Minimum number of reads required in each sample for the site to be considered for adding to MeDeCom.

MIN_COVG_QUANT

Lower quantile of coverages. Values lower than this value will be ignored for analysis.

MAX_COVG_QUANT

Upper quantile of coverages. Values higher than this value will be ignored for analysis.

FILTER_NA

Flag indicating if sites with any missing values are to be removed or not.

FILTER_SNP

Flag indicating if annotated SNPs are to be removed from the list of sites according to RnBeads' SNP list.

snp.list

Path to a file containing positions of known SNPs to be removed from the analysis, if FILTER_SNP is TRUE. The coordinates must be the provided in the same genome assembly as RNB_SET. The file must be a tab-separated value (tsv) file with only one header line an the following meaning of the rows: 1st row: chromosome, 2nd row: position of the SNP on the chromosome

FILTER_SOMATIC

Flag indicating if only somatic probes are to be kept.

Value

A list with four elements:


lutsik/DecompPipeline documentation built on Oct. 13, 2019, 1:51 a.m.