Prep4DeepDEP: Main function of Prep4DeepDEP

Description Usage Arguments Details Note

View source: R/Prep4DeepDEP.r

Description

Prep4DeepDEP generates the genomic and gene fingerprint data tables from user's datasets. Prep4DeepDEP has two main modes:

Please refer to the paper and DeepDEP package (https://codeocean.com/capsule/3348251/tree) about how to use the generated data tables for DeepDEP model training and prediction.

Usage

1
2
3
4
5
6
7
8
9
Prep4DeepDEP(
  exp.data = NULL,
  mut.data = NULL,
  meth.data = NULL,
  cna.data = NULL,
  dep.data = NULL,
  mode = c("Training", "Prediction"),
  filename.out = "data_out"
)

Arguments

exp.data

Gene expression data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to genes and samples, respectively. The data frame should contain sample names as column names and gene symbols (e.g., CCND1) as the first column. Row names are not used by this function. Expression levels are presented by log2(TPM+1) per gene.

mut.data

Mutation data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to genes and samples, respectively. The data frame should contain sample names as column names and gene symbols (e.g., TP53) as the first column. Row names are not used by this function. Mutations are represented by 0/1 binary values per gene, with 1s denoting missense and nonsense mutations, frameshift insertions and deletions, and splice-site mutations.

meth.data

DNA methylation data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to probes and samples, respectively. The data frame should contain sample names as column names and probe ID (e.g., cg00000292) as the first column. Row names are not used by this function. DNA methylation is measured by beta values per probe of Infinium® HumanMethylation27 or HumanMethylation450 BeadChips.

cna.data

Copy number alteration (CNA) data (a data.frame object) of cell lines or tumors. CNA should be prepared as segmented copy-number profiles using the .seg file format against the reference genome hg19. Example of the CNA data can be downloaded from the CCLE portal (https://portals.broadinstitute.org/ccle/data). Rows and columns of the data frame correspond to CNA segments per sample and CNA information, respectively. The following columns are required: CCLE_name (sample name), Chromosome (numeric without ‘Chr’), Start (numeric), End (numeric), and Segment_Mean (in the log2(CN/2) scale).

dep.data

Gene symbols of dependency genes of interest (DepOIs) with or without user’s in-house gene dependency scores. For the “Training” mode, this argument is required and expects a data.frame object of which rows and columns correspond to DepOIs and samples, respectively. The data frame should contain sample names as column names and gene symbol (e.g., TP53) as the first column. For the “Prediction” mode, this argument is optional and expects a data.frame object with a single column of gene symbols (e.g., TP53) of DepOIs that user would like to predict. If the argument is left NULL, the 1298 default genes as studied in the original paper will be used.

mode

“Training” or “Prediction”. The “Training” mode creates data tables of genomics and gene dependencies for all CCL-DepOI pairs (number of samples = number of CCLs x number of DepOIs). Functional fingerprints are generated based on the list of genes of “dep.data”. The ‘Prediction’ mode generates data tables of genomics for all samples (number of samples = number of CCLs/tumors). Functional fingerprints are generated based on the genes of “dep.data”.

filename.out

Path and prefix for the output files.

Details

Note

The “Prediction” mode can be slow and memory-heavy if huge numbers of samples and DepOIs are provided since the generated data and output files have the sample size of #samples x #DepOIs.


chenlabgccri/Prep4DeepDEP documentation built on Sept. 3, 2021, 7:16 a.m.