In davemcg/eyeIntegration_app: eyeIntegration Shiny App Code

knitr::opts_chunk$set(
  message = FALSE,  warning = FALSE,
  collapse = TRUE,
  fig.width = 5, fig.height = 2.5,
  comment = "#>"
)

Intro

Here we go through a process where we pull the raw gene expression counts and sample metadata, inspect the metata, do some custom filtering by samples and gene, plot the data, aggregate the data and export to a csv file.

Load libraries, import count data and metadata

The counts import will take some time, as this file is ~200mb

library(tidyverse)
library(edgeR)
# if the above library(tidyverse) step throws an error, you need to install tidyverse as follows:
# install.packages('tidyverse')
# BiocManager::install('edgeR')

# https://eyeintegration.nei.nih.gov -> Data -> Data Download for links

# you may want to just paste these links into your browser and download the files
# you then would have to update the path to find them on your local computer
# this will save downloading the 200mb file each time you run this
metadata <- data.table::fread("https://hpc.nih.gov/~mcgaugheyd/eyeIntegration/2023/eyeIntegration23_meta_2023_09_01.built.csv.gz")
gene_counts <- data.table::fread("https://hpc.nih.gov/~mcgaugheyd/eyeIntegration/2023/gene_counts.csv.gz")

Let's peek at the two data frames we loaded into R

metadata %>% sample_n(10)

gene_counts[1:10,c(1,100:110)]

Quickly inspect what kind of metadata and sample counts we have

metadata %>% colnames() %>% sort()
metadata %>% group_by(Tissue, Sub_Tissue, Source, Age, Perturbation, Sex_ML) %>% summarise(Count = n()) %>% DT::datatable()

We can aggregate data to the sub_tissue level in a few steps.

First convert to CPM

These are raw counts. So a sample "A" with twice as many reads as sample "B" will have each gene appear to have about twice the expression. So we will use CPM normalization to normalize the total counts.

gene_counts_mat <- gene_counts[,2:ncol(gene_counts)]
gene_cpm_mat <- edgeR::cpm(gene_counts_mat)
# quick check on first sample
gene_cpm_mat[,1] %>% sum()

# add back gene names
row.names(gene_cpm_mat) <- gene_counts$Gene

Grab human readable gene names conversion table

gene_tx_table <- data.table::fread("https://hpc.nih.gov/~mcgaugheyd/eyeIntegration/2023/gene_anno.csv.gz")
gene_tx_table %>% select(-Name) %>% sample_n(10)

Make the data long with `pivot_longer`

This is to facilitate adding metadata to the counts

gene_counts_long <- gene_cpm_mat %>% 
  as_tibble(rownames = 'Gene') %>% 
  pivot_longer(cols=2:ncol(gene_counts), names_to = 'sample_accession', values_to = 'CPM')

gene_counts_long %>% sample_n(10)

Now `left_join` to label each sample

We chose to label with the following:

Tissue (eg. cornea, retina)
Sub_Tissue (e.g. macula)
Source (e.g. iPSC)
Age (e.g. adult)
Sex_ML (machine learnt sex labels)
study_title

gene_counts_long <- left_join(gene_counts_long, 
                           metadata %>% 
                             select(sample_accession, Tissue, Sub_Tissue, 
                                    Source, Age, Sex_ML, study_title) %>% 
                             unique(), 
                           by = 'sample_accession')

gene_counts_long %>% sample_n(10)

Now we can create a custom expression plot

The eyeIntegration data is presented by the Tissue - Sub_Tissue - Source - Age - Perturbation grouping

As we are working in R we can easily do some custom filtering and grouping.

Let's do this: 1. Cornea and Skin only 2. Filter to retain only Primary culture or Native sources of tissue 3. Filter Skin to remove Transformed fibroblasts 3. Group by Tissue - Source - Sex_ML

Plot LUM, MITF, and DCN expression. We use a fancier "geom_quasirandom" to give the points some shaping. We don't use this on the web app because it is computationally expensive.

plot_data <- gene_counts_long %>% 
  filter(Tissue %in% c("Cornea","Skin"),
         !Sub_Tissue %in% 'Transformed fibroblasts',
         Source %in% c("Native","Primary Culture")) %>% 
  mutate(ID = paste(Tissue, Source, Age, Sex_ML, sep = ' - ')) %>% 
  left_join(gene_tx_table %>% select(gene_id, gene_name) %>% unique(), by = c("Gene" = "gene_id")) %>% 
  filter(gene_name %in% c("LUM","MITF","DCN")) 

plot_data %>% 
  ggplot(aes(y=ID,x=log2(CPM+1), color = study_title)) +
  geom_boxplot(color = 'black') +
  ggbeeswarm::geom_quasirandom() + 
  facet_wrap(~gene_name) +
  scale_color_manual(values = pals::alphabet2() %>% unname())

Data in table form

A sort of fancy set of steps where we:

Aggregate expression to the study_title level (i.e. if a study has eight samples, we take the mean of those eight samples)
"Widen" the data from its long form to make it fit more tightly in an output file (and in a form that most people are familiar with in Excel)

table <- plot_data %>% 
  group_by(gene_name, study_title, ID) %>% 
  summarise(meanCPM = log2(mean(CPM)+1)) %>% 
  pivot_wider(values_from = meanCPM, names_from = c(ID,study_title))

table

Output table to csv for excel viewing

table %>% write_csv(file = 'my_table.eyeIntegration.csv')

sessionInfo()

davemcg/eyeIntegration_app documentation built on May 18, 2024, 1:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

davemcg/eyeIntegration_app
eyeIntegration Shiny App Code

In davemcg/eyeIntegration_app: eyeIntegration Shiny App Code

Intro

Load libraries, import count data and metadata

Let's peek at the two data frames we loaded into R

Quickly inspect what kind of metadata and sample counts we have

We can aggregate data to the sub_tissue level in a few steps.

First convert to CPM

Grab human readable gene names conversion table

Make the data long with `pivot_longer`

Now `left_join` to label each sample

Now we can create a custom expression plot

Data in table form

Output table to csv for excel viewing

R Package Documentation

Browse R Packages

We want your feedback!

davemcg/eyeIntegration_app eyeIntegration Shiny App Code

In davemcg/eyeIntegration_app: eyeIntegration Shiny App Code

Intro

Load libraries, import count data and metadata

Let's peek at the two data frames we loaded into R

Quickly inspect what kind of metadata and sample counts we have

We can aggregate data to the sub_tissue level in a few steps.

First convert to CPM

Grab human readable gene names conversion table

Make the data long with pivot_longer

Now left_join to label each sample

Now we can create a custom expression plot

Data in table form

Output table to csv for excel viewing

R Package Documentation

Browse R Packages

We want your feedback!

davemcg/eyeIntegration_app
eyeIntegration Shiny App Code

Make the data long with `pivot_longer`

Now `left_join` to label each sample