knitr::opts_chunk$set(echo = TRUE)
This tutorial will walk you through the use of the Model.Microbiome package for modeling, and how it can be used to test new software. The primary function of Model.Microbiome is to create a series of randomly generated communities. These communities are subsampled to simulate the sequencing process. Then any normalization effort may be applied to these data, and compared for accuracy. This model software does not attempt to model PCR, sequencing, or any other bias that may be introduced on the benchtop or by the DNA isolation procedure. These should be addressed separately. This software only attempts to address inferential bias that is introduced during the sequence count normalization process.
Table of contents: Introduction to the model and it's basic function/structure: Exploring effects of sampling depth: Exploring effects of sparcity: Analysis from the paper: Toolkit for new functions:
First, let's install Model.Microbiome:
devtools::install_github("djeppschmidt/Model.Microbiome")
Now load all the other packages we need (or download and load if you haven't already):
# load required packages #### library(Model.Microbiome) library(reshape2) library(ggplot2) library(vegan) library(dplyr) library(plyr) library(phyloseq) library(viridis) library(ranacapa) library(edgeR) library(limma) library(GLDEX) library(stats)
The central function of this package creates an array (or list of objects) that include a data table of taxon abundance that represents the community being sampled; a subsampled data table (i.e. the corresponding sequencing output), and any outputs from normalization procedures that the user may choose. These community datasets are organized as phyloseq objects that contain the otu table, taxa names, and environmental data. Each of the datasets created has the same experimental design; there are 5 replicates of six experimental conditions. The community data is generated using a stepwise process. First, the environmental metadata is defined. There are five environmental parameters; three of which are related to the experimental design. These are randomly generated each time a new community is made; the specific values are drawn from a normal distribution with specific mean and variance for each experimental condition. The last two environmental variables have a mean and variance that is not captured by the categorical experimental design (ie is pulled from the same distribution across the dataset). The end result is a table of "environmental data" for each sample that includes values that are reflected in the experimental design, and those that are not. This is important as environmental studies are often designed with a tacit understanding of the environmental gradient; in such cases important environmental variables may or may not behave as anticipated by the researcher. Including variables that may be of interest but are not a-priori accurately predicted by the researcher is important for ensuring that statistical softwares do not over-fit taxon abundance to the experimental design and lose resolution on unanticipated environmental drivers.
Model.Microbiome includes 700 individual species equations that relate species abundance to the five environmental parameters. After the environment is created, the community is assembled. In this step, the user may vary the proportion of taxa that may be detected at a global scale; within a group; and within a particular sample. For example, we might say that 20 taxa may exist globally, 30 taxa may exist that are endemic to their experimental condition; and 10 taxa may exist only in the single site. In this case model.microbiome will randomly select 20 taxa from the list of 700 to be in the species list of all sites. Then it will randomly select 30 taxa for each experimental condition that will be in the species list of all sites that are in that experimental condition. And finally, Model.Microbiome will select 10 taxa for each site from the 700. These taxon lists are compiled so that each site includes uniuqe, groupwise, and globally distributed taxa. Then their abundance is calculated. Within each round of selection taxa are selected without replacement, but taxa are selected with replacement between rounds. This means that site-wise selection is blinded to the global selection, and to other site-specific selection. Therefore it is possible that a taxon is selected as a global taxon and a site-specific; or that two sites may include the same organism as site-specific. This is an intentional behavior to model sparcity and the role of historicity in ecological process. By modulating the proportion of global, group, and sitewise taxa, we can test how each normalization procedure handles sparcity. Another thing to note is that just because the abundance of a taxon is calculated at a site does not mean it has a positive count value. Many taxa in the global set will only have positive values in a subset of experimental conditions. This is an intended effect.
Once the species lists are created for each site, then Model.Microbiome calculates their abundance.
The user designates the number of communities that are created, and selects parameters to define the sparcity (i.e. how likely a taxon will appear only in one sample), mean sampling depth (i.e. sequencing depth), variance of sample-wise sequencing depth around the mean, etc. The following procedure will outline two methods for how one may designate the normalization procedure.
Model.Microbiome uses get() to import functions, giving the user platform the flexibility define new normalization functions, and import them into the analysis chain. There are two ways to include new functions. The first is to include them as a parameter in the core function. This requires defining a function. Here is an example:
# define the normalization function Relative.Abundance<-function(x){require(phyloseq) } # create a list of names method<-c("Relative.Abundance") # reps = number of test cases generated # commonN = number of taxa that will be calculated for all samples # groupN = number of taxa that will be calculated within groups # singleN = number of taxa that will be calculated only in a single sample # D = sequencing depth (ie number of samples taken on average) # V = variance of sequencing depth # method = list of names of applied methods testrun<-suppressWarnings(benchmark.MM2(reps=10, commonN=73, groupN=1, singleN=1, D=500, V=250, method))
The second approach also demonstrates the core functionality that allows the user to also add any statistical function/analysis to the array. Again for this process, we define a function
# define the normalization function Relative.Abundance<-function(x){require(phyloseq) } # create a list of names method<-c("Relative.Abundance") testrun2<-
We then do our first modeling run. In this case, reps is the number of times that we repeat the model with the same conditions. CommonN is the number of taxa that are modeled acorss all sites; groupN is the number of taxa that are modeled at each site; singleN are the number of taxa that are modeled at single sites; D is the smean ampling depth, V is the variation around that depth. These are the parameters that are necessary to set up and execute the sampling run. It should be noted that large values for the reps (e.g. greater than 100) may take longer than an hour to run. 1000's of reps could take days, depending on the processing power of your machine. I recommend starting with 10 reps to get the feel of the output structure.
#Test runs: C.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=73, groupN=1, singleN=1, D=500, V=250)) runtest<-run.analysis2(commonN=73, groupN=1, singleN=1, D=500, V=250, method=c("RA")) #
Notice that each repeated run of the function can be referenced from the object (e.g. 'C.D500$rep1$...'). The output format for this model is as lists of lists (or a complex array). It contains phyloseq objects of the community. Within each
# mostly universal taxa # benchmark with total = 75; different sequencing depth (low sparsity) C.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=73, groupN=1, singleN=1, D=500, V=250)) C.D1000<-suppressWarnings(benchmark.MM(reps=10, commonN=73, groupN=1, singleN=1, D=1000, V=500)) C.D2000<-suppressWarnings(benchmark.MM(reps=10, commonN=73, groupN=1, singleN=1, D=2000, V=1000)) C.D10000<-suppressWarnings(benchmark.MM(reps=10, commonN=73, groupN=1, singleN=1, D=10000, V=5000)) # increasing rare taxa (medium-low sparsity) #G.D500<-suppressWarnings(benchmark.MM(reps=1, commonN=50, groupN=15, singleN=10, D=500, V=250)) # benchmark with total = 75; different sequencing depth G.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=50, groupN=15, singleN=10, D=500, V=250)) G.D1000<-suppressWarnings(benchmark.MM(reps=10, commonN=50, groupN=15, singleN=10, D=1000, V=500)) G.D2000<-suppressWarnings(benchmark.MM(reps=10, commonN=50, groupN=15, singleN=10, D=2000, V=1000)) G.D10000<-suppressWarnings(benchmark.MM(reps=10, commonN=50, groupN=15, singleN=10, D=10000, V=5000)) # even distribution of rare taxa (medium sparcity) # benchmark with total = 75; different sequencing depth E.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=25, groupN=25, singleN=25, D=500, V=250)) E.D1000<-suppressWarnings(benchmark.MM(reps=10, commonN=25, groupN=25, singleN=25, D=1000, V=500)) E.D2000<-suppressWarnings(benchmark.MM(reps=10, commonN=25, groupN=25, singleN=25, D=2000, V=1000)) E.D10000<-suppressWarnings(benchmark.MM(reps=10, commonN=25, groupN=25, singleN=25, D=10000, V=5000)) # (medium-high sparsity) # benchmark with total = 75; different sequencing depth S.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=15, groupN=20, singleN=40, D=500, V=250)) S.D1000<-suppressWarnings(benchmark.MM(reps=10, commonN=15, groupN=20, singleN=40, D=1000, V=500)) S.D2000<-suppressWarnings(benchmark.MM(reps=10, commonN=15, groupN=20, singleN=40, D=2000, V=1000)) S.D10000<-suppressWarnings(benchmark.MM(reps=10, commonN=15, groupN=20, singleN=40, D=10000, V=5000)) # (High sparsity) # benchmark with total = 75; different sequencing depth V.D500<-suppressWarnings(benchmark.MM(reps=10, commonN=10, groupN=15, singleN=50, D=500, V=250)) V.D1000<-suppressWarnings(benchmark.MM(reps=10, commonN=10, groupN=15, singleN=50, D=1000, V=500)) V.D2000<-suppressWarnings(benchmark.MM(reps=10, commonN=10, groupN=15, singleN=50, D=2000, V=1000)) V.D10000<-suppressWarnings(benchmark.MM(reps=10, commonN=10, groupN=15, singleN=50, D=10000, V=5000))
# boxplot of R squared from model box.df<-data.frame("reference"=C.D500$rep1$model$R$lm,"deseqVST"=C.D500$rep1$deseqVST$DMI$R$lm,"limmaVST"=C.D500$rep1$limma$DMI$R$lm,"RA"=C.D500$rep1$RA$DMI$R$lm,"erare"=C.D500$rep1$eRare$DMI$R$lm,"scaled"=C.D500$rep1$scaled$DMI$R$lm) box.df$names<-rownames(box.df) m.box<-melt(data=box.df, id.vars="names", measure.vars = c("reference", "RA", "scaled", "erare", "limmaVST", "deseqVST")) keep<-names(C.D500$rep1$scaled$LII$R)[C.D500$rep1$scaled$LII$R>0.9] box.df2<-data.frame("reference"=C.D500$rep1$model$R$lm[names(C.D500$rep1$model$R$lm)%in%keep],"deseqVST"=C.D500$rep1$deseqVST$DMI$R$lm[names(C.D500$rep1$deseqVST$DMI$R$lm)%in%keep],"limmaVST"=C.D500$rep1$limma$DMI$R$lm[names(C.D500$rep1$limma$DMI$R$lm)%in%keep],"RA"=C.D500$rep1$RA$DMI$R$lm[names(C.D500$rep1$RA$DMI$R$lm)%in%keep],"erare"=C.D500$rep1$eRare$DMI$R$lm[names(C.D500$rep1$eRare$DMI$R$lm)%in%keep],"scaled"=C.D500$rep1$scaled$DMI$R$lm[names(C.D500$rep1$scaled$DMI$R$lm)%in%keep]) box.df$names<-rownames(box.df) box.df2$names<-rownames(box.df2) m.box<-melt(data=box.df, id.vars="names", measure.vars = c("reference", "RA", "scaled", "erare", "limmaVST", "deseqVST")) m.box2<-melt(data=box.df2, id.vars="names", measure.vars = c("reference", "RA", "scaled", "erare", "limmaVST", "deseqVST")) ggplot(m.box, aes(x=variable, y=value))+ geom_violin() + #geom_boxplot()+ geom_jitter()+ theme_classic() ggplot(m.box2, aes(x=variable, y=value))+ geom_violin() + #geom_boxplot()+ geom_jitter()+ theme_classic()
om<-prune_taxa(taxa_names(C.D500$rep1$model$comm)%in%keep,C.D500$rep1$model$comm) plot_bar(om) om2<-prune_taxa(taxa_names(C.D500$rep1$raw$comm)%in%keep,C.D500$rep1$raw$comm) names<-data.frame("names"=taxa_names(om2), "number"=c(1:length(taxa_names(om2)))) rownames(names)<-names$names #tax_table(om2)<-names plot_bar(om2)
om3<-otu_table(prune_taxa(taxa_names(C.D500$rep1$model$comm)%in%keep,C.D500$rep1$model$comm)) om4<-otu_table(prune_taxa(taxa_names(C.D500$rep1$scaled$comm)%in%keep,C.D500$rep1$deseqVST$comm))
box.LII<-data.frame("deseqVST"=C.D500$rep1$deseqVST$LII$R,"limmaVST"=C.D500$rep1$limma$LII$R,"RA"=C.D500$rep1$RA$LII$R,"erare"=C.D500$rep1$eRare$LII$R,"scaled"=C.D500$rep1$scaled$LII$R) box.LII$names<-c(1:length(box.LII$RA)) #box.LII$names<-rownames(box.df) m.boxLII<-melt(data=box.LII, id.vars="names", measure.vars = c("RA", "scaled", "erare", "limmaVST", "deseqVST")) ggplot(m.boxLII, aes(x=variable, y=value))+ geom_violin() + #geom_boxplot()+ geom_jitter()+ theme_classic()
# index workspace table(C.D10000$rep3$model$SpeciesMeta$scaledLII==1) table(C.D10000$rep3$model$SpeciesMeta$deseqLII==1) table(C.D10000$rep3$model$SpeciesMeta$limmaLII==1) table(C.D10000$rep3$model$SpeciesMeta$RALII==1) table(S.D500$rep3$model$SpeciesMeta$scaledLII==1) table(S.D500$rep3$model$SpeciesMeta$deseqLII==1) table(S.D500$rep3$model$SpeciesMeta$limmaLII==1) table(S.D500$rep3$model$SpeciesMeta$RALII==1) S.D10000 length(i$rawLII[i$rawLII<0.1])
ext.LII2<-function(tst, threshold){ out<-list("raw"=c(rep(NA, length(tst))),"RA"=c(rep(NA, length(tst))), "scaled"=c(rep(NA, length(tst))), "pRare"=c(rep(NA, length(tst))), "eRare"=c(rep(NA, length(tst))), "deseqVST"=c(rep(NA, length(tst))),"deseqVST_scaled"=c(rep(NA, length(tst))), "limmaVST"=c(rep(NA, length(tst)))) for(i in 1:length(tst)) {out$raw[i]<-length(tst[[i]]$model$SpeciesMeta$rawLII[tst[[i]]$model$SpeciesMeta$rawLII<threshold])/length(tst[[i]]$model$SpeciesMeta$rawLII)} for(i in 1:length(tst)) {out$RA[i]<-length(tst[[i]]$model$SpeciesMeta$RALII[tst[[i]]$model$SpeciesMeta$RALII<threshold])/length(tst[[i]]$model$SpeciesMeta$RALII)} for(i in 1:length(tst)) {out$scaled[i]<-length(tst[[i]]$model$SpeciesMeta$scaledLII[tst[[i]]$model$SpeciesMeta$scaledLII<threshold])/length(tst[[i]]$model$SpeciesMeta$scaledLII)} for(i in 1:length(tst)) {out$pRare[i]<-length(tst[[i]]$model$SpeciesMeta$pRareLII[tst[[i]]$model$SpeciesMeta$pRareLII<threshold])/length(tst[[i]]$model$SpeciesMeta$pRareLII)} for(i in 1:length(tst)) {out$eRare[i]<-length(tst[[i]]$model$SpeciesMeta$eRareLII[tst[[i]]$model$SpeciesMeta$eRareLII<threshold])/length(tst[[i]]$model$SpeciesMeta$eRareLII)} for(i in 1:length(tst)) {out$deseqVST[i]<-length(tst[[i]]$model$SpeciesMeta$deseqLII[tst[[i]]$model$SpeciesMeta$deseqLII<threshold])/length(tst[[i]]$model$SpeciesMeta$deseqLII)} for(i in 1:length(tst)) {out$deseqVST_scaled[i]<-length(tst[[i]]$model$SpeciesMeta$deseqSC.LII[tst[[i]]$model$SpeciesMeta$deseqSC.LII<threshold])/length(tst[[i]]$model$SpeciesMeta$deseqSC.LII)} for(i in 1:length(tst)) {out$limmaVST[i]<-length(tst[[i]]$model$SpeciesMeta$limmaLII[tst[[i]]$model$SpeciesMeta$limmaLII<threshold])/length(tst[[i]]$model$SpeciesMeta$limmaLII)} out1<-as.data.frame(out) out1 #out2<-boxplot(out1) #out4<-c("table"=out1,"plot"=out2) #out4 } # benchmark with total = 75; different sequencing depth LP10_C.D500<-ext.LII2(C.D500, threshold=0.1) LP10_C.D1000<-ext.LII2(C.D1000,threshold=0.1) LP10_C.D2000<-ext.LII2(C.D2000,threshold=0.1) LP10_C.D10000<-ext.LII2(C.D10000,threshold=0.1) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth LP10_G.D500<-ext.LII2(G.D500,threshold=0.1) LP10_G.D1000<-ext.LII2(G.D1000,threshold=0.1) LP10_G.D2000<-ext.LII2(G.D2000,threshold=0.1) LP10_G.D10000<-ext.LII2(G.D10000,threshold=0.1) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth LP10_E.D500<-ext.LII2(E.D500,threshold=0.1) LP10_E.D1000<-ext.LII2(E.D1000,threshold=0.1) LP10_E.D2000<-ext.LII2(E.D2000,threshold=0.1) LP10_E.D10000<-ext.LII2(E.D10000,threshold=0.1) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth LP10_S.D500<-ext.LII2(S.D500,threshold=0.1) LP10_S.D1000<-ext.LII2(S.D1000,threshold=0.1) LP10_S.D2000<-ext.LII2(S.D2000,threshold=0.1) LP10_S.D10000<-ext.LII2(S.D10000,threshold=0.1) LP10_V.D500<-ext.LII2(S.D500,threshold=0.1) LP10_V.D1000<-ext.LII2(S.D1000,threshold=0.1) LP10_V.D2000<-ext.LII2(S.D2000,threshold=0.1) LP10_V.D10000<-ext.LII2(S.D10000,threshold=0.1) # prepare Loss of information for plotting LP10_C.D500$Level<-c(rep(500, 10)) LP10_C.D1000$Level<-c(rep(1000, 10)) LP10_C.D2000$Level<-c(rep(2000, 10)) LP10_C.D10000$Level<-c(rep(10000, 10)) LP10_G.D500$Level<-c(rep(500, 10)) LP10_G.D1000$Level<-c(rep(1000, 10)) LP10_G.D2000$Level<-c(rep(2000, 10)) LP10_G.D10000$Level<-c(rep(10000, 10)) LP10_E.D500$Level<-c(rep(500, 10)) LP10_E.D1000$Level<-c(rep(1000, 10)) LP10_E.D2000$Level<-c(rep(2000, 10)) LP10_E.D10000$Level<-c(rep(10000, 10)) LP10_S.D500$Level<-c(rep(500, 10)) LP10_S.D1000$Level<-c(rep(1000, 10)) LP10_S.D2000$Level<-c(rep(2000, 10)) LP10_S.D10000$Level<-c(rep(10000, 10)) LP10_V.D500$Level<-c(rep(500, 10)) LP10_V.D1000$Level<-c(rep(1000, 10)) LP10_V.D2000$Level<-c(rep(2000, 10)) LP10_V.D10000$Level<-c(rep(10000, 10)) # rep LP10_C.D500$rep<-c(1:10) LP10_C.D1000$rep<-c(11:20) LP10_C.D2000$rep<-c(21:30) LP10_C.D10000$rep<-c(31:40) LP10_G.D500$rep<-c(41:50) LP10_G.D1000$rep<-c(51:60) LP10_G.D2000$rep<-c(61:70) LP10_G.D10000$rep<-c(71:80) LP10_E.D500$rep<-c(81:90) LP10_E.D1000$rep<-c(91:100) LP10_E.D2000$rep<-c(101:110) LP10_E.D10000$rep<-c(111:120) LP10_S.D500$rep<-c(121:130) LP10_S.D1000$rep<-c(131:140) LP10_S.D2000$rep<-c(141:150) LP10_S.D10000$rep<-c(161:170) LP10_V.D500$rep<-c(181:190) LP10_V.D1000$rep<-c(201:210) LP10_V.D2000$rep<-c(211:220) LP10_V.D10000$rep<-c(221:230) LIP10.C<-do.call("rbind", list(LP10_C.D500,LP10_C.D1000,LP10_C.D2000,LP10_C.D10000)) LP10.C2<-melt(data=LIP10.C, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LP10.C2<-data_summary2(LP10.C2, varname="value", groupnames=c("Level", "variable")) LIP10.G<-do.call("rbind", list(LP10_G.D500,LP10_G.D1000,LP10_G.D2000,LP10_G.D10000)) LP10.G2<-melt(data=LIP10.G, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LP10.G2<-data_summary2(LP10.G2, varname="value", groupnames=c("Level", "variable")) LIP10.E<-do.call("rbind", list(LP10_E.D500,LP10_E.D1000,LP10_E.D2000,LP10_E.D10000)) LP10.E2<-melt(data=LIP10.E, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LP10.E2<-data_summary2(LP10.E2, varname="value", groupnames=c("Level", "variable")) LIP10.S<-do.call("rbind", list(LP10_S.D500,LP10_S.D1000,LP10_S.D2000,LP10_S.D10000)) LP10.S2<-melt(data=LIP10.S, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LP10.S2<-data_summary2(LP10.S2, varname="value", groupnames=c("Level", "variable")) LIP10.V<-do.call("rbind", list(LP10_V.D500,LP10_V.D1000,LP10_V.D2000,LP10_V.D10000)) LP10.V2<-melt(data=LIP10.V, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LP10.V2<-data_summary2(LP10.V2, varname="value", groupnames=c("Level", "variable")) ggplot(LP10.C2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(0.2)) + geom_line(position=position_dodge(0.2))+ geom_point(position=position_dodge(0.2))+ xlab("Seq. Depth")+ ylab("Percent of taxa with LI < 0.1")+ scale_color_viridis_d(option="C")+ theme_classic() ggplot(LP10.G2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(0.2)) + geom_line(position=position_dodge(0.2))+ geom_point(position=position_dodge(0.2))+ xlab("Seq. Depth")+ ylab("Percent of taxa with LI < 0.1")+ scale_color_viridis_d(option="C")+ theme_classic() ggplot(LP10.E2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(0.2)) + geom_line(position=position_dodge(0.2))+ geom_point(position=position_dodge(0.2))+ xlab("Seq. Depth")+ ylab("Percent of taxa with LI < 0.1")+ scale_color_viridis_d(option="C")+ theme_classic() ggplot(LP10.S2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(0.2)) + geom_line(position=position_dodge(0.2))+ geom_point(position=position_dodge(0.2))+ xlab("Seq. Depth")+ ylab("Percent of taxa with LI < 0.1")+ scale_color_viridis_d(option="C")+ theme_classic() ggplot(LP10.V2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(0.2)) + geom_line(position=position_dodge(0.2))+ geom_point(position=position_dodge(0.2))+ xlab("Seq. Depth")+ ylab("Percent of taxa with LI < 0.1")+ scale_color_viridis_d(option="C")+ theme_classic()
LIP10.C$sparsity<-c(rep(1, 40)) LIP10.E$sparsity<-c(rep(3, 40)) LIP10.G$sparsity<-c(rep(2, 40)) LIP10.S$sparsity<-c(rep(4, 40)) LIP10.V$sparsity<-c(rep(5, 40)) LIP10.stats<-do.call("rbind", list(LIP10.C, LIP10.E, LIP10.G,LIP10.S, LIP10.V)) LIP10.stats$sparsity<-as.factor(LIP10.stats$sparsity) LIP10.stats$rep<-as.factor(LIP10.stats$rep) LIP10.stats$Level<-as.factor(LIP10.stats$Level) #Permanova.stats$sparsity<-as.factor(Permanova.stats$sparsity) LIP10.stats<-melt(data=LIP10.stats, id.vars=c("Level","rep", "sparsity"), measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) #summary(aov(value~variable*sparsity*Level+variable:rep, data=LIP10.stats)) summary(aov(value~variable*sparsity*Level, data=LIP10.stats)) #TukeyHSD(aov(value~variable*sparsity+variable:rep, data=LIP10.stats)) TukeyHSD(aov(value~variable*sparsity*Level, data=LIP10.stats))
ggrare(S.D500$rep1$raw$comm, 50, color="Factor") ggrare(S.D500$rep1$model$comm, 500, color="Factor") ggplot(C.D500$rep3$model$SpeciesMeta, aes(x=log(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep3$model$SpeciesMeta, aes(x=log(M.Eval),y=deseqLII,col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep3$model$SpeciesMeta, aes(x=log(M.Eval),y=limmaLII,col=abs(limmaDMI)))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=rawLII,y=rawDMI,col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=RALII,y=RADMI,col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=limmaLII,y=limmaDMI,col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=eRareLII,y=eRareDMI,col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=deseqDMI,y=scaledDMI, col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") #ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=deseqLII,y=scaledLII, col=log(M.Eval)))+geom_point()+ scale_color_viridis_c(option="C") #ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=deseqLII,y=deseqDMI,col=sd_abundance))+geom_point() #ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=deseqLII,y=deseqDMI,col=mean_abundance))+geom_point()
ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep2$model$SpeciesMeta, aes(x=log10(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep3$model$SpeciesMeta, aes(x=log10(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep4$model$SpeciesMeta, aes(x=log10(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep5$model$SpeciesMeta, aes(x=log10(M.Eval),y=scaledLII,col=scaledDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=deseqLII,col=scaledLII/deseqLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep2$model$SpeciesMeta, aes(x=log10(M.Eval),y=deseqLII,col=scaledLII/deseqLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep3$model$SpeciesMeta, aes(x=log10(M.Eval),y=deseqLII,col=scaledLII/deseqLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep4$model$SpeciesMeta, aes(x=log10(M.Eval),y=deseqLII,col=scaledLII/deseqLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep5$model$SpeciesMeta, aes(x=log10(M.Eval),y=deseqLII,col=scaledLII/deseqLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=limmaLII,col=scaledLII/limmaLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep2$model$SpeciesMeta, aes(x=log10(M.Eval),y=limmaLII,col=scaledLII/limmaLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep3$model$SpeciesMeta, aes(x=log10(M.Eval),y=limmaLII,col=scaledLII/limmaLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep4$model$SpeciesMeta, aes(x=log10(M.Eval),y=limmaLII,col=scaledLII/limmaLII))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D10000$rep5$model$SpeciesMeta, aes(x=log10(M.Eval),y=limmaLII,col=scaledLII/limmaLII))+geom_point()+ scale_color_viridis_c(option="C")
ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ theme_classic() ggplot(C.D1000$rep2$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ theme_classic() ggplot(C.D1000$rep3$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ theme_classic() ggplot(C.D1000$rep4$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ theme_classic() ggplot(C.D1000$rep5$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ theme_classic()
ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(rawLII),col=rawDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Log raw LII value", limits=c(-8, 0))+ theme_classic() ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(RALII),col=RADMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Log relative abundance LII value", limits=c(-8, 0))+ theme_classic() ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(scaledLII),col=scaledDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Logs scaled LII value", limits=c(-8, 0))+ theme_classic() ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Log deseq LII value", limits=c(-8, 0))+ theme_classic() ggplot(C.D1000$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(limmaLII),col=limmaDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Log limma LII value", limits=c(-8, 0))+ theme_classic() ggplot(C.D1000$rep1$model$SpeciesMeta[C.D1000$rep1$model$SpeciesMeta$deseqLII==0,], aes(x=log10(M.Eval),y=log(limmaLII),col=limmaDMI))+ geom_point()+ scale_color_viridis_c(option="C")+ scale_x_continuous(name="Log E value", limits=c(-4, 0)) + scale_y_continuous(name="Log limma LII value", limits=c(-8, 0))+ theme_classic()
t<-data.frame("names"=taxa_names(C.D1000$rep1$model$comm), "names2"=taxa_names(C.D1000$rep1$model$comm), "names3"=taxa_names(C.D1000$rep1$model$comm)) rownames(t)<-taxa_names(C.D1000$rep1$model$comm) t<-tax_table(t) taxa_names(t)<-taxa_names(C.D1000$rep1$model$comm) tax_table(C.D1000$rep1$model$comm)<-t tax_table(C.D1000$rep1$limma$comm)<-t tax_table(C.D1000$rep1$deseqVST$comm)<-t tax_table(C.D1000$rep1$scaled$comm)<-t keep<-row.names(C.D1000$rep1$model$SpeciesMeta)[C.D1000$rep1$model$SpeciesMeta$rawLII==0] reference_r<-prune_taxa(taxa_names(C.D1000$rep1$model$comm)%in%keep, C.D1000$rep1$model$comm) limma_r<-prune_taxa(taxa_names(C.D1000$rep1$limma$comm)%in%keep, C.D1000$rep1$limma$comm) deseq_r<-prune_taxa(taxa_names(C.D1000$rep1$deseqVST$comm)%in%keep, C.D1000$rep1$deseqVST$comm) scaled_r<-prune_taxa(taxa_names(C.D1000$rep1$scaled$comm)%in%keep, C.D1000$rep1$scaled$comm) plot_bar(reference_r, fill="ta1") plot_bar(limma_r, fill="ta1") plot_bar(deseq_r, fill="ta1") plot_bar(scaled_r, fill="ta1")
ggplot(C.D500$rep1$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep2$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep3$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep4$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C") ggplot(C.D500$rep5$model$SpeciesMeta, aes(x=log10(M.Eval),y=log(deseqLII),col=deseqDMI))+geom_point()+ scale_color_viridis_c(option="C")
sample_sums(C.D500$rep1$raw$comm) C.D500$rep1$metrics$skewness C.D500$rep1$metrics$stats C.D500$rep1$metrics$Richness C.D500$rep1$metrics$lost RI<-ext.RI(C.D500) sk2<-NULL for(i in 1:length(C.D500)){sk2[i]<-median(apply(X = otu_table(C.D500[[i]]$raw$comm), MARGIN=2,FUN = function(x){skewness(x)}))} sk<-NULL for(i in 1:length(C.D500)){sk[i]<-C.D500[[i]]$metrics$skewness} ri<-NULL for(i in 1:length(C.D500)){ri[i]<-estimate_richness(C.D500[[i]]$raw$comm, measures="Observed")}#/C.D500[[i]]$metrics$Richness} #estimate_richness(C.D500$rep1$raw$comm) for(i in 1:length(C.D500)){ri[i]<-mean(ri[[i]])} #C.D500$rep1$metrics$Richness-estimate_richness(C.D500$rep1$raw$comm, measures="Observed")
# explore data ggrare(C.D500$rep1$raw$comm, 50, color="Factor") ggrare(C.D500$rep1$model$comm, 500, color="Factor") ggrare(C.D500$rep2$raw$comm, 50, color="Factor") ggrare(C.D500$rep2$model$comm, 500, color="Factor") ggrare(C.D500$rep2$raw$comm, 50, color="Factor") ggrare(C.D500$rep2$model$comm, 500, color="Factor") ggrare(C.D500$rep37$raw$comm, 50, color="Factor") ggrare(C.D500$rep37$model$comm, 500, color="Factor") View(as.data.frame(as.matrix(otu_table(C.D500$rep2$raw$comm)))) View(as.data.frame(as.matrix(otu_table(C.D500$rep1$raw$comm)))) plot(RI$scaled/RI$deseqVST, sk) abline(v=1,col="red", lwd=3, lty=2) plot(RI$scaled/RI$deseqVST, sk2) abline(v=1,col="red", lwd=3, lty=2) plot(RI$scaled/RI$deseqVST, ri) abline(v=1,col="red", lwd=3, lty=2) plot(sk,ri, cex=c(RI$scaled/RI$deseqVST))
# PERMANOVA output #### # benchmark with total = 75; different sequencing depth P_C.D500<-ext.PERMANOVA2(C.D500) P_C.D1000<-ext.PERMANOVA2(C.D1000) P_C.D2000<-ext.PERMANOVA2(C.D2000) P_C.D10000<-ext.PERMANOVA2(C.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth P_G.D500<-ext.PERMANOVA(G.D500) P_G.D1000<-ext.PERMANOVA(G.D1000) P_G.D2000<-ext.PERMANOVA(G.D2000) P_G.D10000<-ext.PERMANOVA(G.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth P_S.D500<-ext.PERMANOVA(S.D500) P_S.D1000<-ext.PERMANOVA(S.D1000) P_S.D2000<-ext.PERMANOVA(S.D2000) P_S.D10000<-ext.PERMANOVA(S.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth P_E.D500<-ext.PERMANOVA(E.D500) P_E.D1000<-ext.PERMANOVA(E.D1000) P_E.D2000<-ext.PERMANOVA(E.D2000) P_E.D10000<-ext.PERMANOVA(E.D10000) P_V.D500<-ext.PERMANOVA(V.D500) P_V.D1000<-ext.PERMANOVA(V.D1000) P_V.D2000<-ext.PERMANOVA(V.D2000) P_V.D10000<-ext.PERMANOVA(V.D10000)
# prepare permanova data for plotting #### P_C.D500$Level<-c(rep(500, 10)) P_C.D1000$Level<-c(rep(1000, 10)) P_C.D2000$Level<-c(rep(2000, 10)) P_C.D10000$Level<-c(rep(10000, 10)) P_G.D500$Level<-c(rep(500, 10)) P_G.D1000$Level<-c(rep(1000, 10)) P_G.D2000$Level<-c(rep(2000, 10)) P_G.D10000$Level<-c(rep(10000, 10)) P_E.D500$Level<-c(rep(500, 10)) P_E.D1000$Level<-c(rep(1000, 10)) P_E.D2000$Level<-c(rep(2000, 10)) P_E.D10000$Level<-c(rep(10000, 10)) P_S.D500$Level<-c(rep(500, 10)) P_S.D1000$Level<-c(rep(1000, 10)) P_S.D2000$Level<-c(rep(2000, 10)) P_S.D10000$Level<-c(rep(10000, 10)) P_V.D500$Level<-c(rep(500, 10)) P_V.D1000$Level<-c(rep(1000, 10)) P_V.D2000$Level<-c(rep(2000, 10)) P_V.D10000$Level<-c(rep(10000, 10)) #rep P_C.D500$rep<-c(1:10) P_C.D1000$rep<-c(11:20) P_C.D2000$rep<-c(21:30) P_C.D10000$rep<-c(31:40) P_G.D500$rep<-c(51:60) P_G.D1000$rep<-c(61:70) P_G.D2000$rep<-c(71:80) P_G.D10000$rep<-c(81:90) P_E.D500$rep<-c(91:100) P_E.D1000$rep<-c(101:110) P_E.D2000$rep<-c(111:120) P_E.D10000$rep<-c(121:130) P_S.D500$rep<-c(131:140) P_S.D1000$rep<-c(141:150) P_S.D2000$rep<-c(151:160) P_S.D10000$rep<-c(161:170) P_V.D500$rep<-c(171:180) P_V.D1000$rep<-c(181:190) P_V.D2000$rep<-c(191:200) P_V.D10000$rep<-c(201:210)
# permanova plots ext.PERMANOVA2<-function(tst){ out<-list("raw"=c(rep(NA, length(tst))),"RA"=c(rep(NA, length(tst))), "scaled"=c(rep(NA, length(tst))), "pRare"=c(rep(NA, length(tst))), "eRare"=c(rep(NA, length(tst))), "deseqVST"=c(rep(NA, length(tst))),"deseqVST_scaled"=c(rep(NA, length(tst))), "limmaVST"=c(rep(NA, length(tst)))) for(i in 1:length(tst)) {out$raw[i]<-sum(tst[[i]]$raw$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$RA[i]<-sum(tst[[i]]$RA$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$scaled[i]<-sum(tst[[i]]$scaled$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$pRare[i]<-sum(tst[[i]]$pRare$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$eRare[i]<-sum(tst[[i]]$eRare$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$deseqVST[i]<-sum(tst[[i]]$deseqVST$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$deseqVST_scaled[i]<-sum(tst[[i]]$deseqVST_scaled$PERMANOVA$aov.tab$R2[2:6])} for(i in 1:length(tst)) {out$limmaVST[i]<-sum(tst[[i]]$limma$PERMANOVA$aov.tab$R2[2:6])} out1<-as.data.frame(out) out1 #out2<-boxplot(out1) #out4<-c("table"=out1,"plot"=out2) #out4 } Permanova.C<-do.call("rbind", list(P_C.D500,P_C.D1000,P_C.D2000,P_C.D10000)) Permanova.C2<-melt(data=Permanova.C, id.vars="Level", measure.vars = c("raw", "RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) library(ggplot2) Permanova.C3<-data_summary2(Permanova.C2, varname="value", groupnames=c("Level", "variable")) Permanova.C2<-data_summary(Permanova.C2, varname="value", groupnames=c("Level", "variable")) ggplot(Permanova.C2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Mean +/- se)")+ theme_classic() ggplot(Permanova.C3, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Median +/- min/max)")+ theme_classic()
# permanova plots Permanova.G<-do.call("rbind", list(P_G.D500,P_G.D1000,P_G.D2000,P_G.D10000)) Permanova.G2<-melt(data=Permanova.G, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) Permanova.G3<-data_summary2(Permanova.G2, varname="value", groupnames=c("Level", "variable")) Permanova.G2<-data_summary(Permanova.G2, varname="value", groupnames=c("Level", "variable")) ggplot(Permanova.G2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Mean +/- se)")+ theme_classic() ggplot(Permanova.G3, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Median +/- min/max)")+ theme_classic()
# permanova plots Permanova.E<-do.call("rbind", list(P_E.D500,P_E.D1000,P_E.D2000,P_E.D10000)) Permanova.E2<-melt(data=Permanova.E, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) Permanova.E3<-data_summary2(Permanova.E2, varname="value", groupnames=c("Level", "variable")) Permanova.E2<-data_summary(Permanova.E2, varname="value", groupnames=c("Level", "variable")) ggplot(Permanova.E2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Mean +/- se)")+ theme_classic() ggplot(Permanova.E3, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Median +/- min/max)")+ theme_classic()
# permanova plots Permanova.S<-do.call("rbind", list(P_S.D500,P_S.D1000,P_S.D2000,P_S.D10000)) Permanova.S2<-melt(data=Permanova.S, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) Permanova.S3<-data_summary2(Permanova.S2, varname="value", groupnames=c("Level", "variable")) Permanova.S2<-data_summary(Permanova.S2, varname="value", groupnames=c("Level", "variable")) ggplot(Permanova.S2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Mean +/- se)")+ theme_classic() ggplot(Permanova.S3, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Median +/- min/max)")+ theme_classic()
# permanova plots Permanova.V<-do.call("rbind", list(P_V.D500,P_V.D1000,P_V.D2000,P_V.D10000)) Permanova.V2<-melt(data=Permanova.V, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) Permanova.V3<-data_summary2(Permanova.V2, varname="value", groupnames=c("Level", "variable")) Permanova.V2<-data_summary(Permanova.V2, varname="value", groupnames=c("Level", "variable")) ggplot(Permanova.V2, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=value-sd, ymax=value+sd), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Mean +/- se)")+ theme_classic() ggplot(Permanova.V3, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("PERMANOVA R.sq. Index Value (Median +/- min/max)")+ theme_classic()
Permanova.C$sparsity<-c(rep(1, 400)) Permanova.E$sparsity<-c(rep(3, 400)) Permanova.G$sparsity<-c(rep(2, 400)) Permanova.S$sparsity<-c(rep(4, 400)) Permanova.V$sparsity<-c(rep(5, 400)) Permanova.stats<-do.call("rbind", list(Permanova.C, Permanova.E, Permanova.G, Permanova.S, Permanova.V)) Permanova.stats$sparsity<-as.factor(Permanova.stats$sparsity) Permanova.stats$rep<-as.factor(Permanova.stats$rep) #Permanova.stats$sparsity<-as.factor(Permanova.stats$sparsity) P.stats<-melt(data=Permanova.stats, id.vars=c("Level","rep", "sparsity"), measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) summary(aov(value~variable*sparsity+variable:rep, data=P.stats)) TukeyHSD(aov(value~variable*sparsity+variable:rep, data=P.stats))
# LII output #### # benchmark with total = 75; different sequencing depth L_C.D500<-ext.LII(C.D500) L_C.D1000<-ext.LII(C.D1000) L_C.D2000<-ext.LII(C.D2000) L_C.D10000<-ext.LII(C.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth L_G.D500<-ext.LII(G.D500) L_G.D1000<-ext.LII(G.D1000) L_G.D2000<-ext.LII(G.D2000) L_G.D10000<-ext.LII(G.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth L_E.D500<-ext.LII(E.D500) L_E.D1000<-ext.LII(E.D1000) L_E.D2000<-ext.LII(E.D2000) L_E.D10000<-ext.LII(E.D10000) # seq depth at mostly universal taxa # benchmark with total = 75; different sequencing depth L_S.D500<-ext.LII(S.D500) L_S.D1000<-ext.LII(S.D1000) L_S.D2000<-ext.LII(S.D2000) L_S.D10000<-ext.LII(S.D10000) L_V.D500<-ext.LII(S.D500) L_V.D1000<-ext.LII(S.D1000) L_V.D2000<-ext.LII(S.D2000) L_V.D10000<-ext.LII(S.D10000) # prepare Loss of information for plotting L_C.D500$Level<-c(rep(500, 100)) L_C.D1000$Level<-c(rep(1000, 100)) L_C.D2000$Level<-c(rep(2000, 100)) L_C.D10000$Level<-c(rep(10000, 100)) L_G.D500$Level<-c(rep(500, 100)) L_G.D1000$Level<-c(rep(1000, 100)) L_G.D2000$Level<-c(rep(2000, 100)) L_G.D10000$Level<-c(rep(10000, 100)) L_E.D500$Level<-c(rep(500, 100)) L_E.D1000$Level<-c(rep(1000, 100)) L_E.D2000$Level<-c(rep(2000, 100)) L_E.D10000$Level<-c(rep(10000, 100)) L_S.D500$Level<-c(rep(500, 100)) L_S.D1000$Level<-c(rep(1000, 100)) L_S.D2000$Level<-c(rep(2000, 100)) L_S.D10000$Level<-c(rep(10000, 100)) L_V.D500$Level<-c(rep(500, 100)) L_V.D1000$Level<-c(rep(1000, 100)) L_V.D2000$Level<-c(rep(2000, 100)) L_V.D10000$Level<-c(rep(10000, 100)) # rep L_C.D500$rep<-c(1:100) L_C.D1000$rep<-c(101:200) L_C.D2000$rep<-c(201:300) L_C.D10000$rep<-c(301:400) L_G.D500$rep<-c(401:500) L_G.D1000$rep<-c(501:600) L_G.D2000$rep<-c(601:700) L_G.D10000$rep<-c(701:800) L_E.D500$rep<-c(801:900) L_E.D1000$rep<-c(901:1000) L_E.D2000$rep<-c(1001:1100) L_E.D10000$rep<-c(1101:1200) L_S.D500$rep<-c(1201:1300) L_S.D1000$rep<-c(1301:1400) L_S.D2000$rep<-c(1401:1500) L_S.D10000$rep<-c(1601:1700) L_V.D500$rep<-c(1801:1900) L_V.D1000$rep<-c(2001:2100) L_V.D2000$rep<-c(2101:2200) L_V.D10000$rep<-c(2201:2300)
# plot LII #### LII.C<-do.call("rbind", list(L_C.D500,L_C.D1000,L_C.D2000,L_C.D10000)) LII.C2<-melt(data=LII.C, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LII.C2<-data_summary2(LII.C, varname="value", groupnames=c("Level", "variable")) LII.C500<-LII.C[LII.C$Level==500,] LII.C1000<-LII.C[LII.C$Level==1000,] LII.C2000<-LII.C[LII.C$Level==2000,] LII.C10000<-LII.C[LII.C$Level==10000,] ggplot(LII.C, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line()+ geom_point()+ xlab("Seq. Depth")+ ylab("Lost Information Index Value (Median +/- min/max)")+ theme_classic()
LII.G<-do.call("rbind", list(L_G.D500,L_G.D1000,L_G.D2000,L_G.D10000)) LII.G2<-melt(data=LII.G, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LII.G2<-data_summary2(LII.G2, varname="value", groupnames=c("Level", "variable")) LII.G500<-LII.G[LII.G$Level==500,] LII.G1000<-LII.G[LII.G$Level==1000,] LII.G2000<-LII.G[LII.G$Level==2000,] LII.G10000<-LII.G[LII.G$Level==10000,] ggplot(LII.G, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line()+ geom_point()+ xlab("Seq. Depth")+ ylab("Lost Information Index Value (Median +/- min/max)")+ theme_classic()
LII.E<-do.call("rbind", list(L_E.D500,L_E.D1000,L_E.D2000,L_E.D10000)) LII.E2<-melt(data=LII.E, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LII.E2<-data_summary2(LII.E2, varname="value", groupnames=c("Level", "variable")) LII.E500<-LII.E[LII.E$Level==500,] LII.E1000<-LII.E[LII.E$Level==1000,] LII.E2000<-LII.E[LII.E$Level==2000,] LII.E10000<-LII.E[LII.E$Level==10000,] ggplot(LII.E, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Lost Information Index Value (Median +/- min/max)")+ theme_classic()
LII.S<-do.call("rbind", list(L_S.D500,L_S.D1000,L_S.D2000,L_S.D10000)) LII.S2<-melt(data=LII.S, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LII.S2<-data_summary2(LII.S2, varname="value", groupnames=c("Level", "variable")) LII.S500<-LII.S[LII.S$Level==500,] LII.S1000<-LII.S[LII.S$Level==1000,] LII.S2000<-LII.S[LII.S$Level==2000,] LII.S10000<-LII.S[LII.S$Level==10000,] ggplot(LII.S, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Lost Information Index Value (Median +/- min/max)")+ theme_classic()
LII.V<-do.call("rbind", list(L_V.D500,L_V.D1000,L_V.D2000,L_V.D10000)) LII.V2<-melt(data=LII.V, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) LII.V2<-data_summary2(LII.V2, varname="value", groupnames=c("Level", "variable")) LII.V500<-LII.V[LII.V$Level==500,] LII.V1000<-LII.V[LII.V$Level==1000,] LII.V2000<-LII.V[LII.V$Level==2000,] LII.V10000<-LII.V[LII.V$Level==10000,] ggplot(LII.V, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Lost Information Index Value (Median +/- min/max)")+ theme_classic()
with(LII.V, plot())
LII.C$sparsity<-c(rep(1, 400)) LII.E$sparsity<-c(rep(3, 400)) LII.G$sparsity<-c(rep(2, 400)) LII.S$sparsity<-c(rep(4, 400)) LII.V$sparsity<-c(rep(5, 400)) L.stats<-do.call("rbind", list(LII.C, LII.E, LII.G, LII.S, LII.V)) L.stats<-melt(data=L.stats, id.vars=c("Level","rep", "sparsity"), measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) summary(aov(value~variable*sparsity+variable:rep, data=L.stats)) TukeyHSD(aov(value~variable*sparsity+variable:rep, data=L.stats))
# extract over-modeling index #### O_C.D500<-ext.OI(C.D500) O_C.D1000<-ext.OI(C.D1000) O_C.D2000<-ext.OI(C.D2000) O_C.D10000<-ext.OI(C.D10000) O_G.D500<-ext.OI(G.D500) O_G.D1000<-ext.OI(G.D1000) O_G.D2000<-ext.OI(G.D2000) O_G.D10000<-ext.OI(G.D10000) O_E.D500<-ext.OI(E.D500) O_E.D1000<-ext.OI(E.D1000) O_E.D2000<-ext.OI(E.D2000) O_E.D10000<-ext.OI(E.D10000) O_S.D500<-ext.OI(S.D500) O_S.D1000<-ext.OI(S.D1000) O_S.D2000<-ext.OI(S.D2000) O_S.D10000<-ext.OI(S.D10000) O_V.D500<-ext.OI(V.D500) O_V.D1000<-ext.OI(V.D1000) O_V.D2000<-ext.OI(V.D2000) O_V.D10000<-ext.OI(V.D10000) # prepare to plot overmodeling index O_C.D500$Level<-c(rep(500, 100)) O_C.D1000$Level<-c(rep(1000, 100)) O_C.D2000$Level<-c(rep(2000, 100)) O_C.D10000$Level<-c(rep(10000, 100)) O_G.D500$Level<-c(rep(500, 100)) O_G.D1000$Level<-c(rep(1000, 100)) O_G.D2000$Level<-c(rep(2000, 100)) O_G.D10000$Level<-c(rep(10000, 100)) O_E.D500$Level<-c(rep(500, 100)) O_E.D1000$Level<-c(rep(1000, 100)) O_E.D2000$Level<-c(rep(2000, 100)) O_E.D10000$Level<-c(rep(10000, 100)) O_S.D500$Level<-c(rep(500, 100)) O_S.D1000$Level<-c(rep(1000, 100)) O_S.D2000$Level<-c(rep(2000, 100)) O_S.D10000$Level<-c(rep(10000, 100)) O_V.D500$Level<-c(rep(500, 100)) O_V.D1000$Level<-c(rep(1000, 100)) O_V.D2000$Level<-c(rep(2000, 100)) O_V.D10000$Level<-c(rep(10000, 100)) # rep O_C.D500$rep<-c(1:100) O_C.D1000$rep<-c(101:200) O_C.D2000$rep<-c(201:300) O_C.D10000$rep<-c(301:400) O_G.D500$rep<-c(401:500) O_G.D1000$rep<-c(601:700) O_G.D2000$rep<-c(701:800) O_G.D10000$rep<-c(801:900) O_E.D500$rep<-c(901:1000) O_E.D1000$rep<-c(1001:1100) O_E.D2000$rep<-c(1101:1200) O_E.D10000$rep<-c(1201:1300) O_S.D500$rep<-c(1301:1400) O_S.D1000$rep<-c(1401:1500) O_S.D2000$rep<-c(1501:1600) O_S.D10000$rep<-c(1601:1700) O_V.D500$rep<-c(1701:1800) O_V.D1000$rep<-c(1801:1900) O_V.D2000$rep<-c(1901:2000) O_V.D10000$rep<-c(2001:2100)
# plot overmodeling index #### OI.C<-do.call("rbind", list(O_C.D500,O_C.D1000,O_C.D2000,O_C.D10000)) OI.C2<-melt(data=OI.C, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) OI.C2<-data_summary2(OI.C2, varname="value", groupnames=c("Level", "variable")) ggplot(OI.C, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Overmodeling Index Value (Median +/- min/max)")+ theme_classic()
OI.G<-do.call("rbind", list(O_G.D500,O_G.D1000,O_G.D2000,O_G.D10000)) OI.G2<-melt(data=OI.G, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) OI.G2<-data_summary2(OI.G2, varname="value", groupnames=c("Level", "variable")) ggplot(OI.G, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Overmodeling Index Value (Median +/- min/max)")+ theme_classic()
OI.E<-do.call("rbind", list(O_E.D500,O_E.D1000,O_E.D2000,O_E.D10000)) OI.E2<-melt(data=OI.E, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) OI.E2<-data_summary2(OI.E2, varname="value", groupnames=c("Level", "variable")) ggplot(OI.E, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Overmodeling Index Value (Median +/- min/max)")+ theme_classic()
OI.S<-do.call("rbind", list(O_S.D500,O_S.D1000,O_S.D2000,O_S.D10000)) OI.S2<-melt(data=OI.S, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) OI.S2<-data_summary2(OI.S2, varname="value", groupnames=c("Level", "variable")) ggplot(OI.S, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Overmodeling Index Value (Median +/- min/max)")+ theme_classic()
OI.V<-do.call("rbind", list(O_V.D500,O_V.D1000,O_V.D2000,O_V.D10000)) OI.V2<-melt(data=OI.V, id.vars="Level", measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) OI.V2<-data_summary2(OI.V2, varname="value", groupnames=c("Level", "variable")) ggplot(OI.C, aes(x=Level, y=value, group = variable, color=variable))+ geom_errorbar(aes(ymin=low, ymax=high), width=.1, position=position_dodge(300)) + geom_line(position=position_dodge(300))+ geom_point(position=position_dodge(300))+ xlab("Seq. Depth")+ ylab("Overmodeling Index Value (Median +/- min/max)")+ theme_classic()
with(plot())
OI.C$sparsity<-c(rep(1, 400)) OI.E$sparsity<-c(rep(3, 400)) OI.G$sparsity<-c(rep(2, 400)) OI.S$sparsity<-c(rep(4, 400)) OI.V$sparsity<-c(rep(5, 400)) O.stats<-do.call("rbind", list(OI.C, OI.E, OI.G, OI.S, OI.V)) #O.stats$sparsity<-as.factor(O.stats$sparsity) O.stats<-melt(data=O.stats, id.vars=c("Level", "rep", "sparsity"), measure.vars = c("raw","RA", "scaled", "eRare", "pRare", "limmaVST", "deseqVST", "deseqVST_scaled")) summary(aov(value~variable*sparsity+variable:rep, data=O.stats)) TukeyHSD(aov(value~variable*sparsity+variable:rep, data=O.stats))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.