Preprocess_GeneExpression: The Preprocess_GeneExpression function

Description Usage Arguments Details Value Examples

Description

Pre-processes gene expression data from TCGA.

Usage

1
2
Preprocess_GeneExpression(CancerSite, MAdirectories,
  MissingValueThresholdGene = 0.3, MissingValueThresholdSample = 0.1)

Arguments

CancerSite

character of length 1 with TCGA cancer code.

MAdirectories

character vector with directories with the downloaded data. It can be the object returned by the Download_DNAmethylation function.

MissingValueThresholdGene

threshold for missing values per gene. Genes with a percentage of NAs greater than this threshold are removed. Default is 0.3.

MissingValueThresholdSample

threshold for missing values per sample. Samples with a percentage of NAs greater than this threshold are removed. Default is 0.1.

Details

Pre-process includes eliminating samples and genes with too many NAs, imputing NAs, and doing Batch correction.

Value

List with the pre-processed data matrix for cancer and normal samples.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 

# Optional register cluster to run in parallel
library(doParallel)
cl <- makeCluster(5)
registerDoParallel(cl)

# Gene expression data for ovarian cancer
cancerSite <- "OV"
targetDirectory <- paste0(getwd(), "/")

# Downloading gene expression data
GEdirectories <- Download_GeneExpression(cancerSite, targetDirectory, TRUE)

# Processing gene expression data
GEProcessedData <- Preprocess_GeneExpression(cancerSite, GEdirectories)

# Saving gene expression processed data
saveRDS(GEProcessedData, file = paste0(targetDirectory, "GE_", cancerSite, "_Processed.rds"))

stopCluster(cl)

## End(Not run)

gevaertlab/MethylMix documentation built on May 13, 2019, 11:53 p.m.