View source: R/preprocessing.R
make_dataset | R Documentation |
This function makes the Penda dataset with controls and cases pre-filtred and sorted by median in controls.
make_dataset(
controls,
data_case,
detectlowvalue = TRUE,
detectNA = TRUE,
threshold = 0.99,
val_min = NA,
bimod = TRUE
)
controls |
The first matrix with datas to analyze (ex: controls samples). |
data_case |
The second matrix with datas to analyze (ex: tumors samples). |
detectlowvalue |
If detectlowvalue is True, genes with a low values in more than 'threshold*100' % of samples are removed. |
detectNA |
If detectNA is True, genes and samples with more than 'threshold*100' % of NA values are removed. |
threshold |
The maximum proportion of expression under val_min or NA tolerated for each gene or sample. |
val_min |
The minimum value accepted. If val_min is NA, we compute this value as specified by the parameter 'bimod'. |
bimod |
If bimod is True and val_min NA, val_min is computed by mixtools::normalmixEM for a bimodal distribution, to search the value of the first peak. If bimod is False and val_min NA, val_min is computed for an unimodal distribution with a quantile of 0.1. |
This function return a list with preprocessed data_ctrl and data_case, and the vector 'info' with the different parameters.
# Example for make_dataset function
data_ctrl = penda::penda_data_ctrl
data_case = penda::penda_data_case
dataset = penda::make_dataset(controls = data_ctrl,
data_case = data_case,
detectlowvalue = TRUE,
detectNA = TRUE,
threshold = 0.99,
val_min = NA,
bimod = TRUE)
data_ctrl = dataset$data_ctrl
data_case = dataset$data_case
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.