Description Usage Arguments Details Note
Prep4DeepDEP generates the genomic and gene fingerprint data tables from user's datasets. Prep4DeepDEP has two main modes:
The “Prediction” mode generates data for DeepDEP to predict gene dependency scores of unscreened CCLs or tumors. It extracts and orders the required genomic features from user’s genome-wide datasets, and generates the functional fingerprints of gene dependencies of interest (DepOIs) from a user-provided list or the default 1,298 genes we studied in the paper. For the copy number alteration (CNA) data, an embedded R function (PrepCNA) converts copy-number segments to bins (every 10k bases in the genome) and calculate per-bin CNA scores.
The “Training” mode generates data to train a new DeepDEP model using user's genome-wide genomic data and gene dependency scores from an in-house CRISPR screening experiment. It creates data tables of genomics and gene dependencies for all CCL-DepOI pairs (number of samples = number of CCLs x number of DepOIs). Functional fingerprints are generated based on the list of genes available in the gene dependency dataset.
Please refer to the paper and DeepDEP package (https://codeocean.com/capsule/3348251/tree) about how to use the generated data tables for DeepDEP model training and prediction.
1 2 3 4 5 6 7 8 9 |
exp.data |
Gene expression data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to genes and samples, respectively. The data frame should contain sample names as column names and gene symbols (e.g., CCND1) as the first column. Row names are not used by this function. Expression levels are presented by log2(TPM+1) per gene. |
mut.data |
Mutation data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to genes and samples, respectively. The data frame should contain sample names as column names and gene symbols (e.g., TP53) as the first column. Row names are not used by this function. Mutations are represented by 0/1 binary values per gene, with 1s denoting missense and nonsense mutations, frameshift insertions and deletions, and splice-site mutations. |
meth.data |
DNA methylation data (a data.frame object) of cell lines or tumors. Rows and columns of the data frame correspond to probes and samples, respectively. The data frame should contain sample names as column names and probe ID (e.g., cg00000292) as the first column. Row names are not used by this function. DNA methylation is measured by beta values per probe of Infinium® HumanMethylation27 or HumanMethylation450 BeadChips. |
cna.data |
Copy number alteration (CNA) data (a data.frame object) of cell lines or tumors. CNA should be prepared as segmented copy-number profiles using the .seg file format against the reference genome hg19. Example of the CNA data can be downloaded from the CCLE portal (https://portals.broadinstitute.org/ccle/data). Rows and columns of the data frame correspond to CNA segments per sample and CNA information, respectively. The following columns are required: CCLE_name (sample name), Chromosome (numeric without ‘Chr’), Start (numeric), End (numeric), and Segment_Mean (in the log2(CN/2) scale). |
dep.data |
Gene symbols of dependency genes of interest (DepOIs) with or without user’s in-house gene dependency scores. For the “Training” mode, this argument is required and expects a data.frame object of which rows and columns correspond to DepOIs and samples, respectively. The data frame should contain sample names as column names and gene symbol (e.g., TP53) as the first column. For the “Prediction” mode, this argument is optional and expects a data.frame object with a single column of gene symbols (e.g., TP53) of DepOIs that user would like to predict. If the argument is left NULL, the 1298 default genes as studied in the original paper will be used. |
mode |
“Training” or “Prediction”. The “Training” mode creates data tables of genomics and gene dependencies for all CCL-DepOI pairs (number of samples = number of CCLs x number of DepOIs). Functional fingerprints are generated based on the list of genes of “dep.data”. The ‘Prediction’ mode generates data tables of genomics for all samples (number of samples = number of CCLs/tumors). Functional fingerprints are generated based on the genes of “dep.data”. |
filename.out |
Path and prefix for the output files. |
For each genomic data, Prep4DeepDEP extracts and orders the genomic features that are required to run the Python DeepDEP tool (4539 mutations, 6016 gene expressions, 7460 CNA bins, and 6617 methylation probes). At least one of the four genomics should be provided in order to run Prep4DeepDEP. If multiple genomic profiles are provided, only the samples contained in the first genomic profile will be analyzed across the provided genomic profiles. Please make sure the sample names (CCL name or tumor ID) are consistent across genomic profiles and dep.data.
For each DepOI, Prep4DeepDEP generates the binary status of 3115 functional fingerprints based on the chemical and genetic perturbation (CGP) signatures of the MSigDB database (https://www.gsea-msigdb.org/gsea/msigdb).
Output files of the function: A txt file for each genomic profile is written to the path/filename indicated by filename.out. Another txt file is outputted for the gene fingerprints. If running in the Training mode, a dependency score file is also outputted that contains the reformatted dependency scores from user-provided dep.data.
In the “training” mode, all output samples are generated by the CCL-DepOI combinations; i.e., samples are C1G1 (CCL1-DepOI1), C1G2, …, C2G1, C2G2, …
Missing values in exp.data are filled by the mean value of the corresponding genomic feature in the 278 CCLs used in our study. Missing values in mut.data are filled by the median of CCLs. Missing values in meth.data are filled by zero.
The “Prediction” mode can be slow and memory-heavy if huge numbers of samples and DepOIs are provided since the generated data and output files have the sample size of #samples x #DepOIs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.