TPM: TPM_22Q2

TPMR Documentation

TPM_22Q2

Description

The 'TPM' dataset contains the 22Q2 CCLE "Transcript Per Million" RNAseq gene expression data for protein coding genes. This dataset includes data from 19221 genes, 1406 cell lines, 33 primary diseases and 30 lineages. The columns of 'TPM' are: 'depmap_id', a foreign key corresponding to the cancer cell lineage, 'cell_line' the common CCLE name of the cancer cell lines, 'gene' containing both the HUGO gene name of the knockdown gene along with ensembl ID#, 'gene_name' containing the HUGO gene name and 'ensembl_id' containing only the ensembl ID# and 'rna_expression' which contains the numerical protein coding gene expression change at scale (log2(TPM+1)). This dataset can be loaded into R environment with the 'depmap_TPM' function.

Usage

TPM

Format

A data frame with 27024726 rows (cell lines) and 6 variables:

depmap_id

Cell line foreign key (i.e. "ACH-000956")

cell_line

Name of cancer cell line (i.e. "22RV1_PROSTATE")

gene

HUGO symbol and Ensembl ID (e.g. TSPAN6 (ENSG00000000003))

gene_name

HUGO symbol (e.g. "TSPAN6")

ensembl_id

Ensembl ID (e.g. ENSG00000044574)

rna_expression

Log fold (log2(TPM+1)) protein expression change

Details

This data originates from the 'CCLE_expression.csv' file taken from the 22Q2 [Broad Institute](https://depmap.org/portal/download/) cancer depenedency study. The derived dataset found in the 'depmap' package features the addition of a foreign key 'depmap_id' found in the first column of this dataset, which was added from the 'metadata' dataset. This dataset has been converted to a long format tibble. Variables names from the original dataset were converted to lower case, put in snake case, and abbreviated where feasible.

Change log

- 19Q1: Initial dataset consisted of a data frame with 67360300 rows (cell lines) and 6 variables representing 57820 genes, 1165 cell lines, 33 primary diseases, 32 lineages.

- 19Q2: removes 1618 genes, adds 36 cell lines, removes one primary disease and adds 1 lineage

- 19Q3: removes 37058 genes, adds 9 cell lines, removes 2 primary diseases. Now a 23164240 by 6 dataframe.

- 19Q4: 0 genes, 39 cancer cell lines, 0 primary diseases and 1 lineage

- 20Q1: adds 31 cell lines

- 20Q2: adds 34 cell lines. 'expression' changed to 'rna_expression'

- 20Q3: adds 1 cell line

- 20Q4: removes 38 genes and 71 cell lines, addes 1 primary disease and 3 lineages

- 21Q1: removes 5 genes

- 21Q2: adds 3 cell lines. Additionally, a bug was fixed where the Entrez ID appeared as Ensembl_ID. This was changed for all previous versions of this dataset from 19Q3 to 21Q2

- 21Q3: removes 2 cell lines

- 21Q4: adds 12 cell lines

- 22Q1: adds 4 cell lines and 1 lineage

- 22Q2: adds 44 genes, 13 cell lines and removes 8 lineages

Source

DepMap, Broad Institute: https://depmap.org/portal/download/

References

Tsherniak, A., Vazquez, F., Montgomery, P. G., Weir, B. A., Kryukov, G., Cowley, G. S., ... & Meyers, R. M. (2017). Defining a cancer dependency map. Cell, 170(3), 564-576.

DepMap, Broad (2019): DepMap Achilles 19Q1 Public. https://figshare.com/articles/DepMap_Achilles_19Q1_Public/7655150. Fileset.

Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784.

Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next- generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).

Examples

## Not run: 
depmap_TPM()

## End(Not run)


UCLouvain-CBIO/depmap documentation built on Aug. 18, 2024, 9:46 p.m.