The gene_tpm
import will take some time, as this file is 614mb
library(tidyverse) #https://eyeintegration.nei.nih.gov -> Data -> Data Download for links metadata <- read_tsv('https://hpc.nih.gov/~mcgaugheyd/eyeIntegration/2019_metadata_04.tsv.gz') gene_tpm <- read_tsv('https://hpc.nih.gov/~mcgaugheyd/eyeIntegration/2019_gene_TPM_04.tsv.gz')
metadata %>% sample_n(10) gene_tpm[1:10,c(1,100:110)]
pivot_longer
This makes it easier for the computer to work with the data
gene_tpm_long <- gene_tpm %>% pivot_longer(cols=2:ncol(gene_tpm), names_to = 'sample_accession', values_to = 'TPM') gene_tpm_long %>% sample_n(10)
left_join
to label each sample with the tissue and sub_tissue fieldsgene_tpm_long <- left_join(gene_tpm_long, metadata %>% select(sample_accession, Tissue, Sub_Tissue), by = 'sample_accession') gene_tpm_long %>% sample_n(10)
Sub_tissue
levelWe will take the mean gene (ID
) expression of each individual sample, by the Sub_Tissue
assignment
gene_tpm_long_by_sub_tissue <- gene_tpm_long %>% group_by(ID, Sub_Tissue) %>% summarise(TPM = mean(TPM)) gene_tpm_long_by_sub_tissue %>% sample_n(10)
gene_tpm_long_by_sub_tissue %>% filter(grepl('Cornea', Sub_Tissue)) %>% group_by(Sub_Tissue) %>% top_n(3, wt = TPM) %>% mutate(TPM = log2(TPM + 1)) %>% arrange(Sub_Tissue)
Sub_Tissue
Sub_Tissues
do we have?r gene_tpm_long_by_sub_tissue$Sub_Tissue %>% unique() %>% length()
gene_tpm_long_by_sub_tissue$Sub_Tissue %>% unique() %>% length()
Count
is the number of (sub) tissues that the gene expression is 1 log2(TPM + 1) (double, or two fold) higher in Cornea - Adult
. The max we can have is 74.
Mean Delta log2(TPM+1)
is the average (mean) delta fold change between Cornea - Adult and the other tissues. Each log2(TPM+1) is a doubling in expressed.
The Mean Cornea - Adult log2(TPM + 1)
is the average Cornea - Adult
expression.
This column is useful to remove genes with overall low expression.
We see that we have over 400 genes that are:
Sub_Tissues
Cornea - Adult
above log(TPM + 1) > 10 (which we use as a rough "highly expressed" cut-off)left_join(gene_tpm_long_by_sub_tissue %>% filter(Sub_Tissue == 'Cornea - Adult Tissue') %>% rename(cTPM = TPM), gene_tpm_long_by_sub_tissue %>% filter(!Sub_Tissue == 'Cornea - Adult Tissue'), by = c('ID')) %>% mutate(Delta = log2(cTPM + 1) - log2(TPM + 1)) %>% filter(Delta > 1) %>% group_by(ID) %>% summarise(Count = n(), `Mean Delta log2(TPM+1)` = mean(Delta), `Mean Cornea - Adult log2(TPM + 1)` = mean(log2(cTPM + 1))) %>% arrange(-Count, -`Mean Delta log2(TPM+1)`) %>% filter(Count >= 70, `Mean Cornea - Adult log2(TPM + 1)` > 10) %>% DT::datatable()
What about genes in other Cornea sub tissues?
Genes that are LOWER in expression in cornea?
devtools::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.