TPM | R Documentation |
The 'TPM' dataset contains the 22Q2 CCLE "Transcript Per Million" RNAseq gene expression data for protein coding genes. This dataset includes data from 19221 genes, 1406 cell lines, 33 primary diseases and 30 lineages. The columns of 'TPM' are: 'depmap_id', a foreign key corresponding to the cancer cell lineage, 'cell_line' the common CCLE name of the cancer cell lines, 'gene' containing both the HUGO gene name of the knockdown gene along with ensembl ID#, 'gene_name' containing the HUGO gene name and 'ensembl_id' containing only the ensembl ID# and 'rna_expression' which contains the numerical protein coding gene expression change at scale (log2(TPM+1)). This dataset can be loaded into R environment with the 'depmap_TPM' function.
TPM
A data frame with 27024726 rows (cell lines) and 6 variables:
Cell line foreign key (i.e. "ACH-000956")
Name of cancer cell line (i.e. "22RV1_PROSTATE")
HUGO symbol and Ensembl ID (e.g. TSPAN6 (ENSG00000000003))
HUGO symbol (e.g. "TSPAN6")
Ensembl ID (e.g. ENSG00000044574)
Log fold (log2(TPM+1)) protein expression change
This data originates from the 'CCLE_expression.csv' file taken from the 22Q2 [Broad Institute](https://depmap.org/portal/download/) cancer depenedency study. The derived dataset found in the 'depmap' package features the addition of a foreign key 'depmap_id' found in the first column of this dataset, which was added from the 'metadata' dataset. This dataset has been converted to a long format tibble. Variables names from the original dataset were converted to lower case, put in snake case, and abbreviated where feasible.
- 19Q1: Initial dataset consisted of a data frame with 67360300 rows (cell lines) and 6 variables representing 57820 genes, 1165 cell lines, 33 primary diseases, 32 lineages.
- 19Q2: removes 1618 genes, adds 36 cell lines, removes one primary disease and adds 1 lineage
- 19Q3: removes 37058 genes, adds 9 cell lines, removes 2 primary diseases. Now a 23164240 by 6 dataframe.
- 19Q4: 0 genes, 39 cancer cell lines, 0 primary diseases and 1 lineage
- 20Q1: adds 31 cell lines
- 20Q2: adds 34 cell lines. 'expression' changed to 'rna_expression'
- 20Q3: adds 1 cell line
- 20Q4: removes 38 genes and 71 cell lines, addes 1 primary disease and 3 lineages
- 21Q1: removes 5 genes
- 21Q2: adds 3 cell lines. Additionally, a bug was fixed where the Entrez ID appeared as Ensembl_ID. This was changed for all previous versions of this dataset from 19Q3 to 21Q2
- 21Q3: removes 2 cell lines
- 21Q4: adds 12 cell lines
- 22Q1: adds 4 cell lines and 1 lineage
- 22Q2: adds 44 genes, 13 cell lines and removes 8 lineages
DepMap, Broad Institute: https://depmap.org/portal/download/
Tsherniak, A., Vazquez, F., Montgomery, P. G., Weir, B. A., Kryukov, G., Cowley, G. S., ... & Meyers, R. M. (2017). Defining a cancer dependency map. Cell, 170(3), 564-576.
DepMap, Broad (2019): DepMap Achilles 19Q1 Public. https://figshare.com/articles/DepMap_Achilles_19Q1_Public/7655150. Fileset.
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784.
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next- generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
## Not run:
depmap_TPM()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.