library(DT) motifEnrichment_wLogo <- readRDS("../data/motifs/TFenrichment_V2.rds") GO_data <- readRDS("../data/GO_enrichment.rds") library(testisAtlas) load2("../data/cache") load_component_orderings()
Steady-state self-renewal Spermatogonia
print_tsne(50, point_size = 0.5) print_loadings_scores(50) print_gene_list(50) go_volcano_plot(component = "V50N") head(GO_data[["V50N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V50P") head(GO_data[["V50P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V50N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
In spermatogonial tsne location, but just enrichment not very spematogonial?
print_tsne(7, point_size = 0.5) print_loadings_scores(7) print_gene_list(7) go_volcano_plot(component = "V7P") head(GO_data[["V7P"]][c(2,7,8,10)], 20) head(GO_data[["V7N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V7N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Nxf2 "restricted exclusively to spermatogonia" The role of spermatogonially expressed germ cell-specific genes in mammalian meiosis
print_tsne(31, point_size = 0.5) print_loadings_scores(31) print_gene_list(31) go_volcano_plot(component = "V31P") head(GO_data[["V31P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V31P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Foxo1
print_tsne(33, point_size = 0.5) print_loadings_scores(33) print_gene_list(33) go_volcano_plot(component = "V33N") head(GO_data[["V33N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V33N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Dmrt1 DMRT1 Is Required for Mouse Spermatogonial Stem Cell Maintenance and Replenishment Dnmt3b & Dnmt3a Dynamic expression of DNMT3a and DNMT3b isoforms during male germ cell development in the mouse. Dnmt1 Uchl1 Marks cells on edge of tubule in HPA, Asymmetric distribution of UCH-L1 in spermatogonia is associated with maintenance and differentiation of spermatogonial stem cells. Map7d2 Marks cells on edge of tubule in HPA Scml2 SCML2 establishes the male germline epigenome through regulation of histone H2A ubiquitination.
Overlaps with component 33, but with more late (~pachytene) expression
print_tsne(27, point_size = 0.5) print_loadings_scores(27) print_gene_list(27) go_volcano_plot(component = "V27P") head(GO_data[["V27P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V27P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(2, point_size = 0.5) print_loadings_scores(2) print_gene_list(2) go_volcano_plot(component = "V2P") head(GO_data[["V2P"]][c(2,7,8,10)], 20) head(GO_data[["V2N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V2P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(5, point_size = 0.5) print_loadings_scores(5) print_gene_list(5) go_volcano_plot(component = "V5N") head(GO_data[["V5N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V5N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Well known meiotic genes Prdm9, Dmc1, Tex12. Prdm9 is expressed in pre-lepotene and early-leptotene doi.org/10.1007/s00412-015-0511-3. Dmc1 is also expressed in leptotene doi.org/10.1016/S1097-2765(00)80070-2, Dmc1KO germ cells progress up until an abnormal looking zygotene doi.org/10.1016/S1097-2765(00)80069-6.
GO hits for meiosis, synapsis, chromosome segregation, recombination etc.
Smc1b - Structural Maintenance Of Chromosomes 1B, cohesin
Fmr1nb & Fmr1 (fragile X mental retardation 1, trinucleotide repeat expansion disease, can cause Premature ovarian aging, required for fertility)
Taf4b (Diseases associated with TAF4B include Spermatogenic Failure 13.)
Differential expression of members of the E2F family of transcription factors in rodent testes "E2F-1 protein is stage-specific and most abundant in leptotene to early pachytene spermatocytes"
Similar cells and genes to component 5 (LZ). Also has Prdm9. Ccnb3 - known expression in "leptotene and zygotene" "blocks the mitotic cell cycle in late anaphase" Characterization and expression of mammalian cyclin b3, a prepachytene meiotic cyclin. Meiob - ssDNA exonucleoase, required for recombination, colocalises with RPA foci MEIOB exhibits single-stranded DNA-binding and exonuclease activities and is essential for meiotic recombination. & MEIOB Targets Single-Strand DNA and Is Necessary for Meiotic Recombination Rad51ap2 - RAD51AP2, a novel vertebrate- and meiotic-specific protein, shares a conserved RAD51-interacting C-terminal domain with RAD51AP1/PIR51. see also Molecular basis for enhancement of the meiotic DMC1 recombinase by RAD51 associated protein 1 (RAD51AP1). M1ap - Meiosis I arrest abnormalities lead to severe oligozoospermia in meiosis 1 arresting protein (M1ap)-deficient mice. Gm960 aka TOPOVIBL / C11orf80 - "TOPOVIBL protein interacts and forms a complex with SPO11 and is required for meiotic DSB formation" The TopoVIB-Like protein family is required for meiotic DNA double-strand break formation.
print_tsne(44, point_size = 0.5) print_loadings_scores(44) print_gene_list(44) go_volcano_plot(component = "V44N") head(GO_data[["V44N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V44N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) #aheatmap(t(SDAresults$loadings[[1]][,head(names(sort(-abs(SDAresults$loadings[[1]][44,]))),35)]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of acrosomal genes for each component", scale="row")
print_tsne(13, point_size = 0.5) print_loadings_scores(13) print_gene_list(13) go_volcano_plot(component = "V13N") head(GO_data[["V13N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V13P") head(GO_data[["V13P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V13N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) datatable(motifEnrichment_wLogo[geneSet=="V13P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Pbx3? - in component 5
print_tsne(47, point_size = 0.5) print_loadings_scores(47) print_gene_list(47) go_volcano_plot(component = "V47N") head(GO_data[["V47N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V47P") head(GO_data[["V47P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V47N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) datatable(motifEnrichment_wLogo[geneSet=="V47P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(23, point_size = 0.5) print_loadings_scores(23) print_gene_list(23) go_volcano_plot(component = "V23P") head(GO_data[["V23P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V23P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(38, point_size = 0.5) print_loadings_scores(38) print_gene_list(38) go_volcano_plot(component = "V38P") head(GO_data[["V38P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V38P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Many genes previously found to be upregulated in Hormad1KO by microarray have high loadings, including non XY genes.
Hormad1KO <- fread("../data/previous_studies/Shin/Hormad1_KO_differentially_expressed_genes.csv", header = FALSE) Hormad1KO[!V1 %in% colnames(data)]$V1 # rename old gene https://www.ncbi.nlm.nih.gov/gene/195726 Hormad1KO[V1=="Gm47", V1:="Scml1"] Hormad1KO_genes <- Hormad1KO[V1 %in% colnames(data)]$V1 str(Hormad1KO_genes) tmp <- data.table(loading = sort(SDAresults$loadings[[1]][38,],T), gene = names(sort(SDAresults$loadings[[1]][38,],T))) tmp$rank <- seq_along(tmp$loading) ggplot(tmp, aes(rank, loading)) + geom_point() + geom_rug(data=tmp[gene %in% Hormad1KO_genes], sides = "b", colour="red", size=0.1) plot(sort(round(SDAresults$loadings[[1]][38,Hormad1KO[V1 %in% colnames(data)]$V1],4), T), ylab="Component 38 Loading", xlab="Genes overexpressed in Hormad1 KO (ordered by p value)") hist(rank(-SDAresults$loadings[[1]][38,])[Hormad1KO$V1], breaks=200, xlab='Component 38 loading rank') tripKOgenesranked <- names(sort(-SDAresults$loadings[[1]][38,])) #seq_along(tripKOgenesranked) tmp <- sapply(1:1000, function(x) sum(Hormad1KO$V1 %in% tripKOgenesranked[1:x])/x) plot(tmp, ylab="fraction of genes overexpress in Trim13 KO", xlab="Top X genes of component 38") tmp[200] tmp <- sapply(1:1000, function(x) sum(Hormad1KO$V1 %in% tripKOgenesranked[1:x])/nrow(Hormad1KO)) plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38") tmp[200]
In addition, many of the genes are also those statistically significantly overexpressed in Trip13 KO mice link. Trip13 is required for removing Hormad1 from synapsed chromosomes. Mouse HORMAD1 and HORMAD2, Two Conserved Meiotic Chromosomal Proteins, Are Depleted from Synapsed Chromosome Axes with the Help of TRIP13 AAA-ATPase & Mouse TRIP13/PCH2 Is Required for Recombination and Normal Higher-Order Chromosome Structure during Meiosis
The autosomal genes:
Trip13KO <- fread("../data/previous_studies/Ortega/TripKO.csv", header = FALSE) Trip13KO$LFC <- as.numeric(fread("../data/previous_studies/Ortega/TripKOLFC.txt", header = FALSE)) Trip13KO[V1=="RP23-280E8.2", V1:="Tex13c2"] Trip13KO[V1=="BC023829", V1:="Tmem185a"] Trip13KO[V1=="Gm5753", V1:="Fthl17-ps3"] Trip13KO[V1=="A830080D01Rik", V1:="Bclaf3"] Trip13KO[V1=="Ppp1r3fos", V1:="Flicr"] #Trip13KO[V1=="5730457N03Rik", V1:="Evx1os"] #Trip13KO[V1=="RP23-328P19.4", V1:="Gm42923"] str(Trip13KO[!V1 %in% colnames(SDAresults$loadings[[1]])]$V1) Trip13KO <- Trip13KO[V1 %in% colnames(SDAresults$loadings[[1]])] str(Trip13KO) tmp <- data.table(loading = sort(SDAresults$loadings[[1]][38,],T), gene = names(sort(SDAresults$loadings[[1]][38,],T))) tmp$rank <- seq_along(tmp$loading) tmp[gene %in% Trip13KO$V1, KO := "Trip13"] tmp[gene %in% Hormad1KO_genes, KO := "Hormad1"] c38_rug_plot <- ggplot(tmp, aes(rank, loading,colour=KO)) + geom_point(colour="black") + geom_rug(data=tmp[gene %in% Trip13KO$V1], sides = "b", length=unit(0.1,"npc")) + geom_rug(data=tmp[gene %in% Hormad1KO_genes], sides = "t", length=unit(0.1,"npc")) + scale_y_continuous(expand=c(0.1,0.1)) + ylab("Component\n 38 Gene Loading") + xlab("Rank of gene by loading") + scale_color_brewer(palette = "Set1")+ guides(colour = guide_legend(override.aes = list(size=5))) + theme(legend.position = c(0.6,0.6)) + labs(colour="KO Overexpressed Genes") c38_rug_plot saveRDS(c38_rug_plot, "../data/plots/c38_rug_plot.rds") # scale_color_manual(values=c("Trip13"="#377EB8", "Hormad1"="#E41A1C")) + hormad1_enrich <- component_enrichment(Hormad1KO_genes) trip13_enrich <- component_enrichment(Trip13KO$V1) hormad1_enrich$KO <- "Enrichment of Hormad1 KO Overexpressed Genes" trip13_enrich$KO <- "Enrichment of Trip13 KO Overexpressed Genes" trip13_enrich <- rbind(trip13_enrich, hormad1_enrich) set.seed(123) trip13_enrich_plot <- manhatten_plot(trip13_enrich[!component %in% gsub("V","",half_exclusion_comps)], topn = 4, repel_force = 10, legend_position = c(0.1,0.5)) + facet_wrap(~KO, scales='free_y', nrow=1) trip13_enrich_plot saveRDS(trip13_enrich_plot, "../data/plots/trip13_enrich_plot.rds") plot(SDAresults$loadings[[1]][38,Trip13KO$V1], ylab="Component 38 Loading", xlab="Genes overexpressed in Trim18 KO (ordered by p value)") plot(rank(-SDAresults$loadings[[1]][38,])[Trip13KO$V1], Trip13KO$LFC, ylab="Log Fold Change in Trim18 KO", xlab='Component 38 loading rank') hist(rank(-SDAresults$loadings[[1]][38,])[Trip13KO$V1], breaks=200, xlab='Component 38 loading rank') tripKOgenesranked <- names(sort(-SDAresults$loadings[[1]][38,])) #seq_along(tripKOgenesranked) tmp <- sapply(1:1000, function(x) sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/x) plot(tmp, ylab="fraction of genes overexpress in Trim13 KO", xlab="Top X genes of component 38") tmp[200] tmp <- sapply(1:1000, function(x) sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/nrow(Trip13KO)) plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38") tmp[200] tmp <- sapply(seq_along(tripKOgenesranked), function(x) sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/nrow(Trip13KO)) plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38") str(fisher.test(matrix(c(x, 200-x, 176-x, 19262-200-176+x), 2, 2), alternative='greater'))
print_tsne(42, point_size = 0.5) print_loadings_scores(42) print_gene_list(42) go_volcano_plot(component = "V42N") head(GO_data[["V42N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V42N",-c("enrichedGenes","geneSet"), with=FALSE][1:40], escape = FALSE, filter="top", options=list(pageLength=5))
An ancient transcription factor initiates the burst of piRNA production during early meiosis in mouse testes. "A-MYB initiates pachytene piRNA production"
Ldhc - Lactate Dehydrogenase C - testis version of glycolytic LDHA
library(ggraph) graph <- readRDS("V42_network_graph") ggraph(graph, layout = 'nicely') + geom_edge_link(aes(colour = motif)) + geom_node_point() + theme_graph()
# from "A-MYB (MYBL1) transcription factor is a master regulator of male meiosis" # (Supp tables are PDFs...) #"Nek2","Moap1", P38ip=Supt20, "Mat2a", Mybl1_genes <- c("Morc2b", "Piwil1", "Als2cr12", "Cklf", "Cntd1", "Cenpp", "Poteg", "Znrd1as", "Nol8", "Taf9", "Supt20", "Rpl39l", "Tbl2", "Snx1", "Ralgps2", "Hsp90aa1", "Ube2t", "Ccnb3", "Rfx2") #aheatmap(t(SDAresults$loadings[[1]][,Mybl1_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of predicted direct targets of MYBL1 by ChIPseq") aheatmap(t(SDAresults$loadings[[1]][,Mybl1_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings by component of \n predicted direct targets of MYBL1 by ChIPseq", scale="row") pirna_genes <- c("Piwil1", "Tdrd1", "Tdrd5", "Mov10l1", "Tdrd12", "Piwil2", "Fkbp6", "Mael", "Pld6", "Asz1", "Exd1", "Ddx4", "Tdrkh", "Tdrd9") #aheatmap(t(SDAresults$loadings[[1]][,pirna_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings of pirna genes for each component") aheatmap(t(SDAresults$loadings[[1]][,pirna_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of pirna genes for each component", scale="row")
Similar genes to C42 but turning off (negative loadings + positive cell scores). Apart from Hormad1 KO. No enrichment for positive.
print_tsne(39, point_size = 0.5) print_loadings_scores(39) print_gene_list(39) go_volcano_plot(component = "V39N") head(GO_data[["V39N"]][c(2,7,8,10)], 20) head(GO_data[["V39P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V39N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
The negative loadings contain many DUF622 genes. Mlh3 positive loadings opposite to the rest.
Similar genes to DUF622 / α-takusan component previously. Prss46 (not DUF622), 1700001F09Rik, Gm5458, Gm8362, BC061237, Gm3453 etc.
A number of these genes have been assigned to the SPEER family (SPErm-associated glutamate (E)-Rich protein (SPEER)). By FISH "clear and stage-specific signals are seen in late pachytene spermatocytes and round spermatids" and "marked temporal pattern of expression ... appearing on Day 16 ... coincides with the first appearance of spermatocytes" from SPEER--a new family of testis-specific genes from the mouse. (2003). These genes have high loadings in this component: Speer4b (-0.19), Speer3 (-0.16), Speer4a (-0.15), Speer4e (-0.14). The expression observations are consistent with the placement of this component as the first post-meiotic component. Some of the genes with with this domain are expressed in the brain as part of a larger family Takusan: a large gene family that regulates synaptic activity. (2007).
The DUF622 gene family is absent from non-rodent genomes - it arose from a recent duplication of Dlg5. Many of these genes were previously not present in the mouse genome assembly Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse.
One study "observed that portions of chromosome 14 and of the sex chromosomes share specific features, such as enrichment in H3K9me3 and the presence of multicopy genes that are specifically expressed in round spermatids, suggesting that parts of chromosome 14 are under the same evolutionary constraints than the sex chromosomes." These portions of chr14 "encompass ampliconic genes of the α-takusan family" Expression and epigenomic landscape of the sex chromosomes in mouse post-meiotic male germ cells.
print_tsne(20, point_size = 0.5) print_loadings_scores(20) print_gene_list(20) go_volcano_plot(component = "V20N") head(GO_data[["V20N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V20P") head(GO_data[["V20P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V20N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) datatable(motifEnrichment_wLogo[geneSet=="V20P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
DUF622 <- fread("../../data/MGImarkerQuery_20170327_215120.txt")$V8 # extract gene symbols str(DUF622) DUF622 <- DUF622[(DUF622 %in% colnames(SDAresults$loadings[[1]]))] str(DUF622) data.table(t(SDAresults$loadings[[1]][30, DUF622, drop=F]), keep.rownames = T)[order(-V30)] data.table(t(SDAresults$loadings[[1]][20, DUF622, drop=F]), keep.rownames = T)[order(V20)] sum(names(sort(SDAresults$loadings[[1]][20,]))[1:500] %in% DUF622) sum(names(sort(-SDAresults$loadings[[1]][30,]))[1:500] %in% DUF622) sum(names(sort(SDAresults$loadings[[1]][42,]))[1:500] %in% DUF622) aheatmap(t(SDAresults$loadings[[1]][,DUF622]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings of DUF622 genes for each component", scale='row')
There are also some in the next component C30 and the previous one C42 but of the top 500 genes in C30 only 3 are DUF622 and of the top 500 in C42 only 1 is DUF622. In the top 500 of C20 35 are DUF622 genes.
grid.arrange(grobs=list(print_marker("Gm3453") + simplify, print_marker("Ssxb1") + simplify, print_marker("Ssxb2") + simplify, print_marker("Ssxb3") + simplify), nrow=2, heights = c(1,1))
Only Cul4a have negative scores, many genes with negative loadings are those overexpressed in Cul4a.
print_tsne(25, point_size = 0.5) print_loadings_scores(25) print_gene_list(25) go_volcano_plot(component = "V25N") head(GO_data[["V25N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V25N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
The cul4a association becomes clearer if we transform the scores and flip the colouring - then compare to the position of groups in tsne. See also the codistribution ofsome specific genes with large negative loadings with the Cul4a cells.
ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=asinh(V25))) + geom_point(size=0.5) + scale_color_viridis(direction=1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=group)) + geom_point(size=0.5) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=`Rps13-ps1`)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Hist1h2al)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Gm10036)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Jakmip2)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Tagln2)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime") ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Lpo)) + geom_point(size=0.5) + scale_color_viridis(direction=-1) + theme(legend.position = "bottom") + ggtitle("t-SNE - cells coloured by PseudoTime")
As before Acrv1, Spaca3 & 1, Sox5, Lyzl1 & 6
GO terms for fertilization, sperm egg recognition
print_tsne(30, point_size = 0.5) print_loadings_scores(30) print_gene_list(30) go_volcano_plot(component = "V30P") head(GO_data[["V30P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V30P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
acrosomal_genes <- c( "Lyzl1", "Spaca3", "Lyzl4", "Spaca5", "Lyzl6", "Spata31", "Spata9", "Spaca1", "Acrv1", "Spaca4", "Spaca7", "1700016D06Rik") aheatmap(t(SDAresults$loadings[[1]][,acrosomal_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of acrosomal genes for each component", scale="row") # from "RFX2 Is a Major Transcriptional Regulator of Spermiogenesis." rfx2_genes <- readxl::read_excel("../data/supplementary_data/Kistler/journal.pgen.1005368.s011.XLSX", skip=2) setnames(rfx2_genes, make.names(names(rfx2_genes))) #rfx2_genes <- data.table(rfx2_genes)[ChIP.Seq=="yes"][order(logFC.at.P30)][1:50]$Ensembl.ID rfx2_genes <- data.table(rfx2_genes)[order(Pvalue.at.P30)][!is.na(Pvalue.at.P30)]$Ensembl.ID rfx2_genes <- data.table(rfx2_genes)[ChIP.Seq=="yes"][order(Pvalue.at.P30)][!is.na(Pvalue.at.P30)]$Ensembl.ID # 1700026D08Rik -> Cfap161, Ttc18 -> Cfap70, Ccdc135 -> Drc7, Ccdc11 -> Cfap53, 9330101J02Rik -> Cfap46, Adck4 -> Coq8b library(biomaRt) ensembl <- useMart("ENSEMBL_MART_ENSEMBL", host = "www.ensembl.org", dataset = "mmusculus_gene_ensembl") mapTab2 <- getBM(attributes = c("external_gene_name",'ensembl_gene_id'), filter = "ensembl_gene_id", values = rfx2_genes, mart = ensembl, uniqueRows=TRUE) rfx2_genes <- mapTab2[match(rfx2_genes, mapTab2$ensembl_gene_id),]$external_gene_name rfx2_genes[which(!rfx2_genes %in% colnames(SDAresults$loadings[[1]]))] rfx2_genes <- rfx2_genes[which(rfx2_genes %in% colnames(SDAresults$loadings[[1]]))] aheatmap(t(SDAresults$loadings[[1]][,rfx2_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row") aheatmap(t(SDAresults$pips[[1]][,rfx2_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row") #ranks <- apply(SDAresults$loadings[[1]], 1, function(x) qnorm(rank(x) / (ncol(SDAresults$loadings[[1]])+1) )) #ranks <- apply(SDAresults$loadings[[1]], 1, function(x) qnorm(rank(abs(x)) / (ncol(SDAresults$loadings[[1]])+1) )) #ranks <- apply(SDAresults$loadings[[1]], 1, function(x) rank(abs(x))) #19263-
aheatmap(asinh(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,rfx2_genes])), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq") aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m168_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row") aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m168_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq") aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m1478_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")
aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,rfx2_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row") tmp <- t(SDAresults$pips[[1]][,rfx2_genes]==1) mode(tmp) <- "numeric" aheatmap(tmp[, QC_meiotic_components_ordered], breaks=0, Rowv = NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")
Spata31: (in humans SPATA31A1,6,3,7,5, C1, E1, akak VAD1.3 (aka AEP1 - Acrosome Expressed Protein)): The testis-specific VAD1.3/AEP1 interacts with β-actin and syntaxin 1 and directs peri-nuclear/Golgi expression with bipartite nucleus localization (BNL) sequence. "It is expressed at the acrosomal region of spermatids from postnatal day 25" Targeted disruption of the spermatid-specific gene Spata31 causes male infertility. "very specific localization restricted to the acrosome position" Acrosome-specific gene AEP1: Identification, characterization and roles in spermatogenesis"AEP1 localizes to the acrosome"
Spata45: expressed in testis
Lyzl4: Lyzl4, a novel mouse sperm-related protein, is involved in fertilization "located on the spermatozoa of acrosomal region"
Spaca7: Identification of a new mouse sperm acrosome-associated protein
Zp3r Recombinant mouse sperm ZP3-binding protein (ZP3R/sp56) forms a high order oligomer that binds eggs and inhibits mouse fertilization in vitro. "localized in the acrosomal matrix" Function of the acrosomal matrix: zona pellucida 3 receptor (ZP3R/sp56) is not essential for mouse fertilization.
Spata9 aka NYD-SP16 Human testicular protein NYD-SP16 is involved in sperm capacitation and the acrosome reaction.
Lyzl4os (lysozyme-like 4) specifically expressed in the testis
Spaca5 (Sperm Acrosome Associated 5)
Tekt2 & Tekt4 - Associated with commitment to meiosis, expressed post meiotically - A Global Expression Switch Marks Pachytene Initiation during Mouse Male Meiosis
Tssk1 & Tssk4 "Tssk mRNAs are almost exclusively expressed post meiotically in the testis" Expression and localization of five members of the testis-specific serine kinase (Tssk) family in mouse and human sperm and testis..
Izumo1 & Izumo2 The immunoglobulin superfamily protein Izumo is required for sperm to fuse with eggs Juno is the egg Izumo receptor and is essential for mammalian fertilization
Crem - "The essential role of the Crem gene in normal sperm development is widely accepted and is confirmed by azoospermia in male mice lacking the Crem gene" Novel Insights into the Downstream Pathways and Targets Controlled by Transcription Factors CREM in the Testis
Lrcc34 (immunostained by Min) @ 0.5
library(ggraph) graph <- readRDS("V30_network_graph") ggraph(graph, layout = 'nicely') + geom_edge_link(aes(colour = motif)) + geom_node_point() + theme_graph()
Positive and negative loadings, overalpping C30
print_tsne(35, point_size = 0.5) print_loadings_scores(35) print_gene_list(35) go_volcano_plot(component = "V35N") head(GO_data[["V35N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V35P") head(GO_data[["V35P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V35N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) datatable(motifEnrichment_wLogo[geneSet=="V35P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(15, point_size = 0.5) print_loadings_scores(15) print_gene_list(15) go_volcano_plot(component = "V15N") head(GO_data[["V15N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V15N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
Hemgn - hemogen, has testis specific isoform , "specifically expressed in round spermatids" - Alternative promoters and polyadenylation regulate tissue-specific expression of Hemogen isoforms during hematopoiesis and spermatogenesis
Fam71a / GARI-L4 involved in golgi morphology
Fam71b - unknown function but expressed in testis Proteomic characterization of the human sperm nucleus. Has late human protein expression.
Cst13 (Cymg1 / Cystatin-T) - similar to CRES, expressed in testis at sexual maturity
Ccin - Calicin, basic protein of the sperm head cytoskeleton (calyx), binds actin
Actrt3 - Actin Related Protein T3 (ARPM1) - "expressed exclusively in the testis, particularly in haploid germ cells. ArpM1 protein first appears in the round spermatid" - binds to Profilin 3 another actin binding gene Nuclear localization of profilin III–ArpM1 complex in mouse spermiogenesis
Ccdc159 causes infertility but little known
Prss37 "especially in the elongating spermatids" Prss37 is required for male fertility in the mouse.
Calr3 - Calreticulin 3 (Calsperin), endoplasmic reticulum, "calmegin (CLGN) and calsperin (CALR3) are expressed as germ cell-specific counterparts of CANX and CALR, respectively" & "the onset of CLGN and CALR3 expression was considered to be meiotic and post-meiotic, respectively" Calsperin Is a Testis-specific Chaperone Required for Sperm Fertility, involved in ADAM3 maturation. "CALR3 and PDILT were only expressed in haploid spermatids" Protein disulfide isomerase homolog PDILT is required for quality control of sperm membrane protein ADAM3 and male fertility
Ccser2 (Ccser1 in previous component & in this at rank 85)
Tekt5 (Tekt2 & Tekt4 in previous component) - Associated with commitment to meiosis, expressed post meiotically - A Global Expression Switch Marks Pachytene Initiation during Mouse Male Meiosis
Banf2 - Barrier To Autointegration Factor 2
Znf37 - KRAB ZNF protein
Many well known late spermatogenesis infertility genes.
print_tsne(17, point_size = 0.5) print_loadings_scores(17) print_gene_list(17) go_volcano_plot(component = "V17P") head(GO_data[["V17P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V17P",-c("enrichedGenes"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
NB Atf3 is Crem family TF - CREM, a master-switch of the transcriptional cascade in male germ cells - Novel Insights into the Downstream Pathways and Targets Controlled by Transcription Factors CREM in the Testis - Cell-specific occupancy of an extended repertoire of CREM and CREB binding loci in male germ cells.
general reviews: - Specialized rules of gene transcription in male germ cells: the CREM paradigm. - Genetic control of spermiogenesis: insights from the CREM gene and implications for human infertility
crem_genes <- readxl::read_excel("../data/supplementary_data/Kosir/Table_S4.xls", skip=2) setnames(crem_genes, make.names(names(crem_genes))) crem_genes <- crem_genes$Symbol[1:100] crem_genes <- crem_genes[which(crem_genes %in% colnames(SDAresults$loadings[[1]]))] crem_genes <- crem_genes[1:50] aheatmap(t(SDAresults$loadings[[1]][,crem_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Crem KO & Bound by Crem in ChIPseq", scale="row")
Oaz3, Gapdhs, Smcp
print_tsne(18, point_size = 0.5) print_loadings_scores(18) print_gene_list(18) go_volcano_plot(component = "V18N") head(GO_data[["V18N"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V18P") head(GO_data[["V18P"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V18N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5)) datatable(motifEnrichment_wLogo[geneSet=="V18P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
print_tsne(34, point_size = 1) print_loadings_scores(34) print_gene_list(34) go_volcano_plot(component = "V34P") head(GO_data[["V34P"]][c(2,7,8,10)], 20) go_volcano_plot(component = "V34N") head(GO_data[["V34N"]][c(2,7,8,10)], 20) datatable(motifEnrichment_wLogo[geneSet=="V34N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
tmp <- data.table(SDAresults$scores, cell_index = 1:nrow(SDAresults$scores), experiment = gsub("[A-Z]+\\.","",rownames(SDAresults$scores)), group = gsub("_.*","",gsub("[A-Z]+\\.","",rownames(SDAresults$scores)))) tmp <- melt(tmp, id.vars = c("experiment","cell_index","group")) ggplot(tmp[variable %in% paste0("V",1:15)], aes(experiment, value, colour=group)) + geom_boxplot() + facet_wrap(~variable, scales = "free_y") + scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplot(tmp[variable %in% paste0("V",16:30)], aes(experiment, value, colour=group)) + geom_boxplot() + facet_wrap(~variable, scales = "free_y") + scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplot(tmp[variable %in% paste0("V",31:45)], aes(experiment, value, colour=group)) + geom_boxplot() + facet_wrap(~variable, scales = "free_y") + scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplot(tmp[variable %in% paste0("V",46:50)], aes(experiment, value, colour=group)) + geom_boxplot() + facet_wrap(~variable, scales = "free_y") + scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Others (Pol II GTFs): - TBL1 (TRF2) - TAF4b - TAF7l - GTF2a1l (ALF)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.