pre_proc_data: Pre-process Data
In SparseDC: Implementation of SparseDC Algorithm

Description Usage Arguments Value Examples

This function pre-process the data so that SparseDC can be applied. SparseDC requires data that have been normalized for sequencing depth, log-transformed and centralized on a gene-by-gene basis. For the sequencing depth normalization we recommend that users use one of the many methods developed for normalizing scRNA-Seq data prior to using SparseDC and so can set norm = FALSE. However, here we normalize the data by dividing by the total number of reads. This function log transforms the data by applying log(x + 1) to each of the data sets. By far the most important pre-processing step for SparseDC is the centralization of the data. Having centralized data is a core component of the SparseDC algorithm and is necessary for both accurate clustering of the cells and identifying marker genes. We therefore recommend that all users centralize their data using this function and that only experienced users set center = FALSE.

1	pre_proc_data(dat1, dat2, norm = TRUE, log = TRUE, center = TRUE)

`dat1`	The data for the first condition with samples (cells) as columns and features (genes) as rows.
`dat2`	The data for the second condition with samples (cells) as columns and features (genes) as rows.
`norm`	This parameter controls whether the data is normalized for sequencing depth by dividing each column by the total number of reads for that sample. We recommend that user use one of the many methods for normalizing scRNA-Seq data and so set this as `FALSE`. The default value is `TRUE`
`log`	This parameter controls whether the data is transformed using `log(x + 1)`. The default value is `TRUE`.
`center`	This parameter controls whether the data is centered on a gene by gene basis. We recommend all users center their data prior to applying SparseDC and only experienced users should set this as `FALSE`. The default value is `TRUE`.

This function returns the two pre-processed datasets stored as a list

set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into condition A and B
data_A <- data_test[ , which(condition_biase == "A")]
data_B <- data_test[ , which(condition_biase == "B")]
# Pre-process the data
pre_data <- pre_proc_data(data_A, data_B, norm = FALSE, log = TRUE,
center = TRUE)
# Extract Data
pdata_A <- pre_data[[1]]
pdata_B <- pre_data[[2]]