display_code <- FALSE knitr::opts_knit$set(global.par = TRUE) knitr::opts_chunk$set(warning = FALSE, message = FALSE, echo = display_code)
library(devtools) load_all() library(tidyverse) genome <- "Mus_musculus.GRCm38.94"
Set a cut-off for the minimum number of times a category must appear in the 100 tests.
min_appearances <- 15
Generate lists of mouse genes biased by the length of the genes.
The gene info file contains the lengths of all the genes in the mouse genome. These can be plotted so that some thresholds for length categories can be determined.
genfo_file_name <- paste0("../data/", genome, "_gene_info.txt") column_types <- "cccddcccddd" genfo <- read_tsv(genfo_file_name, col_types = column_types) bg_genes <- genfo %>% select(gene_name) %>% distinct()
bg_genes <- bg_genes[[1]] random_genes <- sapply(1:100, function(i){ bg_genes[ceiling(runif(200, min = 0, max = length(bg_genes) - 1))] })
Run the gene lists through a GO overrepresentation analysis.
go_results1 <- overrep_test(all_go_categories, random_genes[,1], bg_genes) go_results1 random_gene_results <- apply(random_genes, 2, function(query){ overrep_test(all_go_categories, query, bg_genes)#, pval_threshold = 0.2) }) save(random_gene_results, file = "../data/random_genes.rda")
We don't want to run the GO analysis each time the document is knitted as it takes too long. The code was run once and the data saved as an .rda object that can be quickly loaded in to the R session.
See how many significant categories were returned.
load("../data/random_genes.rda") sum(sapply(random_gene_results, is.null)) not_null <- !(sapply(random_gene_results, is.null)) filt <- random_gene_results[not_null] filt[[1]] filt[[2]]
Only r sum(!sapply(random_gene_results, is.null))
sets of genes out of 200 returned any significant results which is reassuring.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.