Meiotic Components

library(DT)
motifEnrichment_wLogo <- readRDS("../data/motifs/TFenrichment_V2.rds")
GO_data <- readRDS("../data/GO_enrichment.rds")

library(testisAtlas)
load2("../data/cache")
load_component_orderings()

50 Gfra1+ Stem Cells

Steady-state self-renewal Spermatogonia

print_tsne(50, point_size = 0.5)
print_loadings_scores(50)
print_gene_list(50)

go_volcano_plot(component = "V50N")
head(GO_data[["V50N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V50P")
head(GO_data[["V50P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V50N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

7 Intermediate Spermatogonia

In spermatogonial tsne location, but just enrichment not very spematogonial?

print_tsne(7, point_size = 0.5)
print_loadings_scores(7)
print_gene_list(7)

go_volcano_plot(component = "V7P")
head(GO_data[["V7P"]][c(2,7,8,10)], 20)

head(GO_data[["V7N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V7N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Nxf2 "restricted exclusively to spermatogonia" The role of spermatogonially expressed germ cell-specific genes in mammalian meiosis

31 Undiff Spermatogonia

print_tsne(31, point_size = 0.5)
print_loadings_scores(31)
print_gene_list(31)

go_volcano_plot(component = "V31P")
head(GO_data[["V31P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V31P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

33 Diff Spermatogonia

print_tsne(33, point_size = 0.5)
print_loadings_scores(33)
print_gene_list(33)

go_volcano_plot(component = "V33N")
head(GO_data[["V33N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V33N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Dmrt1 DMRT1 Is Required for Mouse Spermatogonial Stem Cell Maintenance and Replenishment Dnmt3b & Dnmt3a Dynamic expression of DNMT3a and DNMT3b isoforms during male germ cell development in the mouse. Dnmt1 Uchl1 Marks cells on edge of tubule in HPA, Asymmetric distribution of UCH-L1 in spermatogonia is associated with maintenance and differentiation of spermatogonial stem cells. Map7d2 Marks cells on edge of tubule in HPA Scml2 SCML2 establishes the male germline epigenome through regulation of histone H2A ubiquitination.

27 Early ?

Overlaps with component 33, but with more late (~pachytene) expression

print_tsne(27, point_size = 0.5)
print_loadings_scores(27)
print_gene_list(27)

go_volcano_plot(component = "V27P")
head(GO_data[["V27P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V27P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

2 pre-leptotene

print_tsne(2, point_size = 0.5)
print_loadings_scores(2)
print_gene_list(2)

go_volcano_plot(component = "V2P")
head(GO_data[["V2P"]][c(2,7,8,10)], 20)
head(GO_data[["V2N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V2P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

5 Zygotene / Leptotene

print_tsne(5, point_size = 0.5)
print_loadings_scores(5)
print_gene_list(5)

go_volcano_plot(component = "V5N")
head(GO_data[["V5N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V5N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Well known meiotic genes Prdm9, Dmc1, Tex12. Prdm9 is expressed in pre-lepotene and early-leptotene doi.org/10.1007/s00412-015-0511-3. Dmc1 is also expressed in leptotene doi.org/10.1016/S1097-2765(00)80070-2, Dmc1KO germ cells progress up until an abnormal looking zygotene doi.org/10.1016/S1097-2765(00)80069-6.

GO hits for meiosis, synapsis, chromosome segregation, recombination etc.

CGT & CST

44 Leptotene/Zygotene V2 ??

Similar cells and genes to component 5 (LZ). Also has Prdm9. Ccnb3 - known expression in "leptotene and zygotene" "blocks the mitotic cell cycle in late anaphase" Characterization and expression of mammalian cyclin b3, a prepachytene meiotic cyclin. Meiob - ssDNA exonucleoase, required for recombination, colocalises with RPA foci MEIOB exhibits single-stranded DNA-binding and exonuclease activities and is essential for meiotic recombination. & MEIOB Targets Single-Strand DNA and Is Necessary for Meiotic Recombination Rad51ap2 - RAD51AP2, a novel vertebrate- and meiotic-specific protein, shares a conserved RAD51-interacting C-terminal domain with RAD51AP1/PIR51. see also Molecular basis for enhancement of the meiotic DMC1 recombinase by RAD51 associated protein 1 (RAD51AP1). M1ap - Meiosis I arrest abnormalities lead to severe oligozoospermia in meiosis 1 arresting protein (M1ap)-deficient mice. Gm960 aka TOPOVIBL / C11orf80 - "TOPOVIBL protein interacts and forms a complex with SPO11 and is required for meiotic DSB formation" The TopoVIB-Like protein family is required for meiotic DNA double-strand break formation.

print_tsne(44, point_size = 0.5)
print_loadings_scores(44)
print_gene_list(44)

go_volcano_plot(component = "V44N")
head(GO_data[["V44N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V44N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

#aheatmap(t(SDAresults$loadings[[1]][,head(names(sort(-abs(SDAresults$loadings[[1]][44,]))),35)]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of acrosomal genes for each component", scale="row")

13 X inactive

print_tsne(13, point_size = 0.5)
print_loadings_scores(13)
print_gene_list(13)

go_volcano_plot(component = "V13N")
head(GO_data[["V13N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V13P")
head(GO_data[["V13P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V13N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

datatable(motifEnrichment_wLogo[geneSet=="V13P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Pbx3? - in component 5

47 Early Pachy?

print_tsne(47, point_size = 0.5)
print_loadings_scores(47)
print_gene_list(47)

go_volcano_plot(component = "V47N")
head(GO_data[["V47N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V47P")
head(GO_data[["V47P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V47N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

datatable(motifEnrichment_wLogo[geneSet=="V47P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

23 Hormad1 KO

print_tsne(23, point_size = 0.5)
print_loadings_scores(23)
print_gene_list(23)

go_volcano_plot(component = "V23P")
head(GO_data[["V23P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V23P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

38 X activation (Hormad1 KO)

print_tsne(38, point_size = 0.5)
print_loadings_scores(38)
print_gene_list(38)

go_volcano_plot(component = "V38P")
head(GO_data[["V38P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V38P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Many genes previously found to be upregulated in Hormad1KO by microarray have high loadings, including non XY genes.

Hormad1KO <- fread("../data/previous_studies/Shin/Hormad1_KO_differentially_expressed_genes.csv", header = FALSE)

Hormad1KO[!V1 %in% colnames(data)]$V1

# rename old gene https://www.ncbi.nlm.nih.gov/gene/195726
Hormad1KO[V1=="Gm47", V1:="Scml1"]

Hormad1KO_genes <- Hormad1KO[V1 %in% colnames(data)]$V1

str(Hormad1KO_genes)

tmp <- data.table(loading = sort(SDAresults$loadings[[1]][38,],T),
                  gene = names(sort(SDAresults$loadings[[1]][38,],T)))
tmp$rank <- seq_along(tmp$loading)
ggplot(tmp, aes(rank, loading)) + geom_point() + geom_rug(data=tmp[gene %in% Hormad1KO_genes], sides = "b", colour="red", size=0.1)


plot(sort(round(SDAresults$loadings[[1]][38,Hormad1KO[V1 %in% colnames(data)]$V1],4), T), ylab="Component 38 Loading", xlab="Genes overexpressed in Hormad1 KO (ordered by p value)")

hist(rank(-SDAresults$loadings[[1]][38,])[Hormad1KO$V1], breaks=200, xlab='Component 38 loading rank')

tripKOgenesranked <- names(sort(-SDAresults$loadings[[1]][38,]))

#seq_along(tripKOgenesranked)
tmp <- sapply(1:1000, function(x) sum(Hormad1KO$V1 %in% tripKOgenesranked[1:x])/x)
plot(tmp, ylab="fraction of genes overexpress in Trim13 KO", xlab="Top X genes of component 38")
tmp[200]

tmp <- sapply(1:1000, function(x)
  sum(Hormad1KO$V1 %in% tripKOgenesranked[1:x])/nrow(Hormad1KO))
plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38")
tmp[200]

Hormad1 KO

from Hormad1 Mutation Disrupts Synaptonemal Complex Formation, Recombination, and Chromosome Segregation in Mammalian Meiosis

In addition, many of the genes are also those statistically significantly overexpressed in Trip13 KO mice link. Trip13 is required for removing Hormad1 from synapsed chromosomes. Mouse HORMAD1 and HORMAD2, Two Conserved Meiotic Chromosomal Proteins, Are Depleted from Synapsed Chromosome Axes with the Help of TRIP13 AAA-ATPase & Mouse TRIP13/PCH2 Is Required for Recombination and Normal Higher-Order Chromosome Structure during Meiosis

The autosomal genes:

Trip13KO <- fread("../data/previous_studies/Ortega/TripKO.csv", header = FALSE)
Trip13KO$LFC <- as.numeric(fread("../data/previous_studies/Ortega/TripKOLFC.txt", header = FALSE))

Trip13KO[V1=="RP23-280E8.2", V1:="Tex13c2"]
Trip13KO[V1=="BC023829", V1:="Tmem185a"]
Trip13KO[V1=="Gm5753", V1:="Fthl17-ps3"]
Trip13KO[V1=="A830080D01Rik", V1:="Bclaf3"]
Trip13KO[V1=="Ppp1r3fos", V1:="Flicr"]

#Trip13KO[V1=="5730457N03Rik", V1:="Evx1os"]
#Trip13KO[V1=="RP23-328P19.4", V1:="Gm42923"]

str(Trip13KO[!V1 %in% colnames(SDAresults$loadings[[1]])]$V1)

Trip13KO <- Trip13KO[V1 %in% colnames(SDAresults$loadings[[1]])]
str(Trip13KO)


tmp <- data.table(loading = sort(SDAresults$loadings[[1]][38,],T),
                  gene = names(sort(SDAresults$loadings[[1]][38,],T)))
tmp$rank <- seq_along(tmp$loading)
tmp[gene %in% Trip13KO$V1, KO := "Trip13"]
tmp[gene %in% Hormad1KO_genes, KO := "Hormad1"]


c38_rug_plot <- ggplot(tmp, aes(rank, loading,colour=KO)) +
    geom_point(colour="black") +
    geom_rug(data=tmp[gene %in% Trip13KO$V1], sides = "b", length=unit(0.1,"npc")) +
    geom_rug(data=tmp[gene %in% Hormad1KO_genes], sides = "t", length=unit(0.1,"npc")) +
    scale_y_continuous(expand=c(0.1,0.1)) +
    ylab("Component\n 38 Gene Loading") + xlab("Rank of gene by loading") +
    scale_color_brewer(palette = "Set1")+
    guides(colour = guide_legend(override.aes = list(size=5))) +
    theme(legend.position = c(0.6,0.6)) + labs(colour="KO Overexpressed Genes")

c38_rug_plot

saveRDS(c38_rug_plot, "../data/plots/c38_rug_plot.rds")

# scale_color_manual(values=c("Trip13"="#377EB8", "Hormad1"="#E41A1C")) +

hormad1_enrich <- component_enrichment(Hormad1KO_genes)
trip13_enrich <- component_enrichment(Trip13KO$V1)

hormad1_enrich$KO <- "Enrichment of Hormad1 KO Overexpressed Genes"
trip13_enrich$KO <- "Enrichment of Trip13 KO Overexpressed Genes"

trip13_enrich <- rbind(trip13_enrich, hormad1_enrich)

set.seed(123)
trip13_enrich_plot <- manhatten_plot(trip13_enrich[!component %in% gsub("V","",half_exclusion_comps)], topn = 4, repel_force = 10, legend_position = c(0.1,0.5)) + facet_wrap(~KO, scales='free_y', nrow=1)

trip13_enrich_plot

saveRDS(trip13_enrich_plot, "../data/plots/trip13_enrich_plot.rds")

plot(SDAresults$loadings[[1]][38,Trip13KO$V1], ylab="Component 38 Loading", xlab="Genes overexpressed in Trim18 KO (ordered by p value)")

plot(rank(-SDAresults$loadings[[1]][38,])[Trip13KO$V1], Trip13KO$LFC, ylab="Log Fold Change in Trim18 KO", xlab='Component 38 loading rank')

hist(rank(-SDAresults$loadings[[1]][38,])[Trip13KO$V1], breaks=200, xlab='Component 38 loading rank')

tripKOgenesranked <- names(sort(-SDAresults$loadings[[1]][38,]))

#seq_along(tripKOgenesranked)
tmp <- sapply(1:1000, function(x) sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/x)
plot(tmp, ylab="fraction of genes overexpress in Trim13 KO", xlab="Top X genes of component 38")
tmp[200]

tmp <- sapply(1:1000, function(x)
  sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/nrow(Trip13KO))
plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38")
tmp[200]

tmp <- sapply(seq_along(tripKOgenesranked), function(x)
  sum(Trip13KO$V1 %in% tripKOgenesranked[1:x])/nrow(Trip13KO))
plot(tmp, ylab="Fraction of overexpressed genes in top X genes", xlab="Top X genes of component 38")

str(fisher.test(matrix(c(x, 200-x, 176-x, 19262-200-176+x), 2, 2), alternative='greater'))

42 Pachytene

print_tsne(42, point_size = 0.5)
print_loadings_scores(42)
print_gene_list(42)

go_volcano_plot(component = "V42N")
head(GO_data[["V42N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V42N",-c("enrichedGenes","geneSet"), with=FALSE][1:40], escape = FALSE, filter="top", options=list(pageLength=5))
library(ggraph)
graph <- readRDS("V42_network_graph")
ggraph(graph, layout = 'nicely') + 
    geom_edge_link(aes(colour = motif)) + 
    geom_node_point() +
    theme_graph()
# from "A-MYB (MYBL1) transcription factor is a master regulator of male meiosis"
# (Supp tables are PDFs...)

#"Nek2","Moap1", P38ip=Supt20, "Mat2a",
Mybl1_genes <- c("Morc2b",
"Piwil1",
"Als2cr12",
"Cklf",
"Cntd1",
"Cenpp",
"Poteg",
"Znrd1as",
"Nol8",
"Taf9",
"Supt20",
"Rpl39l",
"Tbl2",
"Snx1",
"Ralgps2",
"Hsp90aa1",
"Ube2t",
"Ccnb3",
"Rfx2")

#aheatmap(t(SDAresults$loadings[[1]][,Mybl1_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of predicted direct targets of MYBL1 by ChIPseq")

aheatmap(t(SDAresults$loadings[[1]][,Mybl1_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings by component of \n predicted direct targets of MYBL1 by ChIPseq", scale="row")

pirna_genes <- c("Piwil1",
"Tdrd1",
"Tdrd5",
"Mov10l1",
"Tdrd12",
"Piwil2",
"Fkbp6",
"Mael",
"Pld6",
"Asz1",
"Exd1",
"Ddx4",
"Tdrkh",
"Tdrd9")

#aheatmap(t(SDAresults$loadings[[1]][,pirna_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings of pirna genes for each component")

aheatmap(t(SDAresults$loadings[[1]][,pirna_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of pirna genes for each component", scale="row")

39 Pachy2

Similar genes to C42 but turning off (negative loadings + positive cell scores). Apart from Hormad1 KO. No enrichment for positive.

print_tsne(39, point_size = 0.5)
print_loadings_scores(39)
print_gene_list(39)

go_volcano_plot(component = "V39N")
head(GO_data[["V39N"]][c(2,7,8,10)], 20)

head(GO_data[["V39P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V39N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

20 DUF622

The negative loadings contain many DUF622 genes. Mlh3 positive loadings opposite to the rest.

Similar genes to DUF622 / α-takusan component previously. Prss46 (not DUF622), 1700001F09Rik, Gm5458, Gm8362, BC061237, Gm3453 etc.

A number of these genes have been assigned to the SPEER family (SPErm-associated glutamate (E)-Rich protein (SPEER)). By FISH "clear and stage-specific signals are seen in late pachytene spermatocytes and round spermatids" and "marked temporal pattern of expression ... appearing on Day 16 ... coincides with the first appearance of spermatocytes" from SPEER--a new family of testis-specific genes from the mouse. (2003). These genes have high loadings in this component: Speer4b (-0.19), Speer3 (-0.16), Speer4a (-0.15), Speer4e (-0.14). The expression observations are consistent with the placement of this component as the first post-meiotic component. Some of the genes with with this domain are expressed in the brain as part of a larger family Takusan: a large gene family that regulates synaptic activity. (2007).

The DUF622 gene family is absent from non-rodent genomes - it arose from a recent duplication of Dlg5. Many of these genes were previously not present in the mouse genome assembly Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse.

One study "observed that portions of chromosome 14 and of the sex chromosomes share specific features, such as enrichment in H3K9me3 and the presence of multicopy genes that are specifically expressed in round spermatids, suggesting that parts of chromosome 14 are under the same evolutionary constraints than the sex chromosomes." These portions of chr14 "encompass ampliconic genes of the α-takusan family" Expression and epigenomic landscape of the sex chromosomes in mouse post-meiotic male germ cells.

print_tsne(20, point_size = 0.5)
print_loadings_scores(20)
print_gene_list(20)

go_volcano_plot(component = "V20N")
head(GO_data[["V20N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V20P")
head(GO_data[["V20P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V20N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

datatable(motifEnrichment_wLogo[geneSet=="V20P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
DUF622 <- fread("../../data/MGImarkerQuery_20170327_215120.txt")$V8 # extract gene symbols
str(DUF622)
DUF622 <- DUF622[(DUF622 %in% colnames(SDAresults$loadings[[1]]))]
str(DUF622)

data.table(t(SDAresults$loadings[[1]][30, DUF622, drop=F]), keep.rownames = T)[order(-V30)]
data.table(t(SDAresults$loadings[[1]][20, DUF622, drop=F]), keep.rownames = T)[order(V20)]

sum(names(sort(SDAresults$loadings[[1]][20,]))[1:500] %in% DUF622)
sum(names(sort(-SDAresults$loadings[[1]][30,]))[1:500] %in% DUF622)
sum(names(sort(SDAresults$loadings[[1]][42,]))[1:500] %in% DUF622)

aheatmap(t(SDAresults$loadings[[1]][,DUF622]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings of DUF622 genes for each component", scale='row')

There are also some in the next component C30 and the previous one C42 but of the top 500 genes in C30 only 3 are DUF622 and of the top 500 in C42 only 1 is DUF622. In the top 500 of C20 35 are DUF622 genes.

grid.arrange(grobs=list(print_marker("Gm3453") + simplify,
                        print_marker("Ssxb1") + simplify,
                        print_marker("Ssxb2") + simplify,
                        print_marker("Ssxb3") + simplify),
            nrow=2, heights = c(1,1))

25 Cul4a

Only Cul4a have negative scores, many genes with negative loadings are those overexpressed in Cul4a.

print_tsne(25, point_size = 0.5)
print_loadings_scores(25)
print_gene_list(25)

go_volcano_plot(component = "V25N")
head(GO_data[["V25N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V25N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

The cul4a association becomes clearer if we transform the scores and flip the colouring - then compare to the position of groups in tsne. See also the codistribution ofsome specific genes with large negative loadings with the Cul4a cells.

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=asinh(V25))) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=group)) +
  geom_point(size=0.5) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=`Rps13-ps1`)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Hist1h2al)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Gm10036)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Jakmip2)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Tagln2)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

ggplot(cell_data, aes(Tsne1_QC1, Tsne2_QC1, color=Lpo)) +
  geom_point(size=0.5) +
  scale_color_viridis(direction=-1) +
  theme(legend.position = "bottom") +
  ggtitle("t-SNE - cells coloured by PseudoTime")

30 Acrosome - post meiotic

As before Acrv1, Spaca3 & 1, Sox5, Lyzl1 & 6

GO terms for fertilization, sperm egg recognition

print_tsne(30, point_size = 0.5)
print_loadings_scores(30)
print_gene_list(30)

go_volcano_plot(component = "V30P")
head(GO_data[["V30P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V30P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))
acrosomal_genes <- c(
"Lyzl1",
"Spaca3",
"Lyzl4",
"Spaca5",
"Lyzl6",
"Spata31",
"Spata9",
"Spaca1",
"Acrv1",
"Spaca4",
"Spaca7",
"1700016D06Rik")

aheatmap(t(SDAresults$loadings[[1]][,acrosomal_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="(row scaled) Gene loadings of acrosomal genes for each component", scale="row")


# from "RFX2 Is a Major Transcriptional Regulator of Spermiogenesis."
rfx2_genes <- readxl::read_excel("../data/supplementary_data/Kistler/journal.pgen.1005368.s011.XLSX", skip=2)
setnames(rfx2_genes, make.names(names(rfx2_genes)))

#rfx2_genes <- data.table(rfx2_genes)[ChIP.Seq=="yes"][order(logFC.at.P30)][1:50]$Ensembl.ID
rfx2_genes <- data.table(rfx2_genes)[order(Pvalue.at.P30)][!is.na(Pvalue.at.P30)]$Ensembl.ID
rfx2_genes <- data.table(rfx2_genes)[ChIP.Seq=="yes"][order(Pvalue.at.P30)][!is.na(Pvalue.at.P30)]$Ensembl.ID

# 1700026D08Rik -> Cfap161, Ttc18 -> Cfap70, Ccdc135 -> Drc7, Ccdc11 -> Cfap53, 9330101J02Rik -> Cfap46, Adck4 -> Coq8b
library(biomaRt)
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", host = "www.ensembl.org", dataset = "mmusculus_gene_ensembl")
mapTab2 <- getBM(attributes = c("external_gene_name",'ensembl_gene_id'),
                filter = "ensembl_gene_id", values = rfx2_genes, mart = ensembl, uniqueRows=TRUE)
rfx2_genes <- mapTab2[match(rfx2_genes, mapTab2$ensembl_gene_id),]$external_gene_name

rfx2_genes[which(!rfx2_genes %in% colnames(SDAresults$loadings[[1]]))]

rfx2_genes <- rfx2_genes[which(rfx2_genes %in% colnames(SDAresults$loadings[[1]]))]

aheatmap(t(SDAresults$loadings[[1]][,rfx2_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row")

aheatmap(t(SDAresults$pips[[1]][,rfx2_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row")

#ranks <- apply(SDAresults$loadings[[1]], 1, function(x) qnorm(rank(x) / (ncol(SDAresults$loadings[[1]])+1) ))
#ranks <- apply(SDAresults$loadings[[1]], 1, function(x) qnorm(rank(abs(x)) / (ncol(SDAresults$loadings[[1]])+1) ))
#ranks <- apply(SDAresults$loadings[[1]], 1, function(x) rank(abs(x))) #19263-
aheatmap(asinh(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,rfx2_genes])), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")

aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m168_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row")

aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m168_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")

aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,m1478_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")
aheatmap(t(SDAresults$loadings[[1]][QC_meiotic_components_ordered,rfx2_genes]), breaks=0, Rowv=NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq", scale="row")

tmp <- t(SDAresults$pips[[1]][,rfx2_genes]==1)
mode(tmp) <- "numeric"
aheatmap(tmp[, QC_meiotic_components_ordered], breaks=0, Rowv = NA, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Rfx2 KO & Bound by Rfx2 in ChIPseq")

Zbtb12 (transfac 2824) vs Rfx (cisbp 2392)

library(ggraph)
graph <- readRDS("V30_network_graph")
ggraph(graph, layout = 'nicely') + 
    geom_edge_link(aes(colour = motif)) + 
    geom_node_point() +
    theme_graph()

35 Post Acrosomal?

Positive and negative loadings, overalpping C30

print_tsne(35, point_size = 0.5)
print_loadings_scores(35)
print_gene_list(35)

go_volcano_plot(component = "V35N")
head(GO_data[["V35N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V35P")
head(GO_data[["V35P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V35N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

datatable(motifEnrichment_wLogo[geneSet=="V35P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

15 Round Spermatid

print_tsne(15, point_size = 0.5)
print_loadings_scores(15)
print_gene_list(15)

go_volcano_plot(component = "V15N")
head(GO_data[["V15N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V15N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

17 Spermiogenesis

Many well known late spermatogenesis infertility genes.

print_tsne(17, point_size = 0.5)
print_loadings_scores(17)
print_gene_list(17)

go_volcano_plot(component = "V17P")
head(GO_data[["V17P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V17P",-c("enrichedGenes"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

NB Atf3 is Crem family TF - CREM, a master-switch of the transcriptional cascade in male germ cells - Novel Insights into the Downstream Pathways and Targets Controlled by Transcription Factors CREM in the Testis - Cell-specific occupancy of an extended repertoire of CREM and CREB binding loci in male germ cells.

general reviews: - Specialized rules of gene transcription in male germ cells: the CREM paradigm. - Genetic control of spermiogenesis: insights from the CREM gene and implications for human infertility

crem_genes <- readxl::read_excel("../data/supplementary_data/Kosir/Table_S4.xls", skip=2)
setnames(crem_genes, make.names(names(crem_genes)))

crem_genes <- crem_genes$Symbol[1:100]
crem_genes <- crem_genes[which(crem_genes %in% colnames(SDAresults$loadings[[1]]))]
crem_genes <- crem_genes[1:50]

aheatmap(t(SDAresults$loadings[[1]][,crem_genes]), breaks=0, Colv = NA, cexRow = 0.7, layout = "_*", main="Gene loadings by component of genes downregulated \n in Crem KO & Bound by Crem in ChIPseq", scale="row")

18 Late Spermiogenesis 1

Oaz3, Gapdhs, Smcp

print_tsne(18, point_size = 0.5)
print_loadings_scores(18)
print_gene_list(18)

go_volcano_plot(component = "V18N")
head(GO_data[["V18N"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V18P")
head(GO_data[["V18P"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V18N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

datatable(motifEnrichment_wLogo[geneSet=="V18P",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

34 Late Spermiogenesis 2

print_tsne(34, point_size = 1)
print_loadings_scores(34)
print_gene_list(34)

go_volcano_plot(component = "V34P")
head(GO_data[["V34P"]][c(2,7,8,10)], 20)

go_volcano_plot(component = "V34N")
head(GO_data[["V34N"]][c(2,7,8,10)], 20)

datatable(motifEnrichment_wLogo[geneSet=="V34N",-c("enrichedGenes","geneSet"), with=FALSE][1:40,], escape = FALSE, filter="top", options=list(pageLength=5))

Batch Effect Hunting

tmp <- data.table(SDAresults$scores,
                  cell_index = 1:nrow(SDAresults$scores),
                  experiment = gsub("[A-Z]+\\.","",rownames(SDAresults$scores)),
                  group = gsub("_.*","",gsub("[A-Z]+\\.","",rownames(SDAresults$scores))))
tmp <- melt(tmp, id.vars = c("experiment","cell_index","group"))

ggplot(tmp[variable %in% paste0("V",1:15)], aes(experiment, value, colour=group)) +
  geom_boxplot() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(tmp[variable %in% paste0("V",16:30)], aes(experiment, value, colour=group)) +
  geom_boxplot() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(tmp[variable %in% paste0("V",31:45)], aes(experiment, value, colour=group)) +
  geom_boxplot() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(tmp[variable %in% paste0("V",46:50)], aes(experiment, value, colour=group)) +
  geom_boxplot() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_manual(values=c(brewer.pal(9,"Set1"),"black")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Previous Studies GRN

From Regulatory complexity revealed by integrated cytological and RNA-seq analyses of meiotic substages in mouse spermatocytes

From Expression dynamics, relationships, and transcriptional regulations of diverse transcripts in mouse spermatogenic cells

Others (Pol II GTFs): - TBL1 (TRF2) - TAF4b - TAF7l - GTF2a1l (ALF)



MyersGroup/testisAtlas documentation built on Nov. 29, 2020, 9:21 p.m.