make_dataset: Making the Penda Dataset

View source: R/preprocessing.R

make_datasetR Documentation

Making the Penda Dataset

Description

This function makes the Penda dataset with controls and cases pre-filtred and sorted by median in controls.

Usage

make_dataset(
  controls,
  data_case,
  detectlowvalue = TRUE,
  detectNA = TRUE,
  threshold = 0.99,
  val_min = NA,
  bimod = TRUE
)

Arguments

controls

The first matrix with datas to analyze (ex: controls samples).

data_case

The second matrix with datas to analyze (ex: tumors samples).

detectlowvalue

If detectlowvalue is True, genes with a low values in more than 'threshold*100' % of samples are removed.

detectNA

If detectNA is True, genes and samples with more than 'threshold*100' % of NA values are removed.

threshold

The maximum proportion of expression under val_min or NA tolerated for each gene or sample.

val_min

The minimum value accepted. If val_min is NA, we compute this value as specified by the parameter 'bimod'.

bimod

If bimod is True and val_min NA, val_min is computed by mixtools::normalmixEM for a bimodal distribution, to search the value of the first peak. If bimod is False and val_min NA, val_min is computed for an unimodal distribution with a quantile of 0.1.

Value

This function return a list with preprocessed data_ctrl and data_case, and the vector 'info' with the different parameters.

Examples

# Example for make_dataset function
data_ctrl = penda::penda_data_ctrl
data_case = penda::penda_data_case
dataset = penda::make_dataset(controls = data_ctrl,
                              data_case = data_case,
                              detectlowvalue = TRUE,
                              detectNA = TRUE,
                              threshold = 0.99,
                              val_min = NA,
                              bimod = TRUE)
data_ctrl = dataset$data_ctrl
data_case = dataset$data_case

CDecamps/penda documentation built on March 29, 2024, 3:26 a.m.