The goal of DoakThesis2020 is to consolidate the data and R scripts from Naqvi et al. with added data from a variety of sources, all of which were used by Doak in her Reed College Senior Thesis.
You can install the released version of DoakThesis2020 from github.com/maddydoak/DoakThesis2020 with:
# Do the following once
install.packages("devtools")
# Then install the package
devtools::install_github("maddydoak/DoakThesis2020")
Several required packages must be installed through BiocManager. Please run the following for the package to work correctly:
# First run this
install.packages("BiocManager")
# Then run
BiocManager::install(c("edgeR", "limma", "qvalue"))
You MUST use data processed using Salmon; only quant.sf files are accepted. Files must be in a particular format to be accepted, as seen below. Provided is an example in which one additional species is incorporated into the main dataset, and then the entire pathway is run to determine conserved sex-biased genes.
# Current files with data for all previous species (human, macaque, mouse, rat, dog, Xenopus laevis):
library(tidyverse)
library(DoakThesis2020)
data("doakMetadata")
data("doakOrthologs")
data("doakTPM")
data("doakCounts")
# To get TPM/counts data from RNA-Seq quantification in Salmon
# Path to a txt2gene file with the following format
txt2gene_filepath <- "data/txt2gene.csv"
# Format:
# sample_txt2gene <- data.frame(transcript_id = c("gene1", "gene2", "gene3"),
# gene_id = c("ABC123", "DEF456", "GHI789"))
# Path to a folder filled with quant.sf files
quant_filepath <- "data/salmon_output/"
prefix <- "SpeciesA"
sample_names <- c("Male_Brain_1", "Male_Brain_2", "Male_Brain_3", "Male_Brain_4", "Male_Brain_5",
"Female_Brain_1", "Female_Brain_2", "Female_Brain_3", "Female_Brain_4", "Female_Brain_5")
all_data <- getQuantData(txt2gene_filepath, quant_filepath, prefix, sample_names)
count_data <- all_data$counts
tpm_data <- all_data$TPM
orthologs <- as_tibble(read_csv2("data/orthologs.csv", header = TRUE, na.strings = c("", "NA")))
orthologs <- dplyr::select(orthologs, gene_name, ref_ID)
# Format:
# sample_orthologs <- data.frame(gene_name = c("ABC123", "DEF456", "GHI789"),
# ref_ID = c("gene1", "gene2", "gene3"))
tissues <- c("Brain")
sexes <- c("Male", "Male", "Male", "Male", "Male",
"Female", "Female", "Female", "Female", "Female")
final_data <- addSpecies(doakMetadata, doakOrthologs, doakTPM, doakCounts,
prefix, orthologs, tpm_data, count_data,
sampleTissues, sexes)
metadata <- final_data$metadata
all_tpm <- final_data$TPM
all_counts <- final_data$counts
orthologs <- final_data$orthologs
# Returns a list of two dataframes: conserved sex-biased genes with just direction (+/- 1) and the same genes with coefficients
conserved <- doSexBiasAnalysis(metadata, all_tpm, all_counts, orthologs)
conserved_direction <- conserved$binary
conserved_coefficients <- conserved$coef
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.