In ruffleymr/CAMI: Identify models of community assembly using ABC and randomForest algorithms

Community Assembly Model Inference

Load/install packages needed for this script.

require(CAMI)
require(randomForest)
require(abc)

Simulation of Data

We will simulate the community assembly data using the function SimCommunityAssembly. The function first simulates a regional community phylogeny and then evolves traits along that phylogeny. Using that trait information, the local community is assembled by either neutral, habitat filtering, or compeitive exclusion processes.

For the power analysis, there are two elements of CAMI we need to ivestigate. The first being whether or not randomForest and ABC are able to classify the simulated community data as the correct community assembly models. The second being whether the amount of data used, i.e. size of the community, contributes to the power of RF and ABC to perform model selection.

#This script ran with 1000 sims per 20 sample sizes ran for approximately 24 hours
#You can load 
sample.sizes <- seq(50,1000,50)
sims <- 1000
SimOutput <- list()

for (i in 1:length(sample.sizes)) {

  #BM models
  BM.neutral <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "neutral")
  BM.filtering <- SimCommunityAssembly(sims = sims, N = sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "filtering", tau = c(1,60))
  BM.competition <- SimCommunityAssembly(sims = sims, N = sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "competition", tau = c(1,60))

  #OU models
  OU.neutral <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "neutral")
  OU.filtering <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "filtering",  tau = c(1,60))
  OU.competition <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "competition",  tau = c(1,60))

  SimOutput[[i]] <- list(BM.neutral, BM.filtering, BM.competition, OU.neutral, OU.filtering, OU.competition)
  print(paste("completed simulations of sample size", sample.sizes[i]))

}

#save output object, which is a list of lists
date <- Sys.Date()
save(SimOutput, file=paste("SimOutput",date,".Rdata", sep=""))

Random Forest

We will use the simulated data from each of the datasets to build several randomForest classifiers. As each classifier is built, the out-of-bag (OOB) error rates will be estimated simultaneously. The error rate will inform as to how well randomForest can classify these data. We will also look at these error rates with respect to the sample size of the communities to understand whether the error rates decrease with an increase in sample size of communities.

#Load dataset (recommended) if not interested in re-simualting

setwd("/Users/Megan/Documents/github/CAMI/Data/PowerAnalysis/")
load(file="SimOutput_PwrAnalysis.Rdata")

#create an empty vector to store error rates of each classifier
RF.Objects <- list()
RF.ErrRates <- c()


for (i in 1:length(sample.sizes)){

  #get all summary stats from each of the models simulated for this sample size (we simulated 6 models)
  all.sum.stats <- c()
  for (k in 1:length(SimOutput[[1]])){
    all.sum.stats <- rbind(all.sum.stats, SimOutput[[i]][[k]]$summary.stats)
  }

  #Correspond each simulation to the model it was simulated under using a categorical variable. 
  #This is the response variable in the random forest analysis
  Model.Index <- rep(c("BM.neut", "BM.filt", "BM.comp", "OU.neut", "OU.filt", "OU.comp"), each=nrow(all.sum.stats)/length(SimOutput[[1]]))
  Ref.Table <- na.omit(data.frame(all.sum.stats[, c(3,4,15, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29, 30)], Model.Index))

  #set number of trees here
  ntree=1000

  #build random forest classifier and record the error rate
    RF.Objects[[i]] <- randomForest(Model.Index ~., data=Ref.Table, ntree=ntree, importance=T)
    RF.ErrRates <- c(RF.ErrRates, RF.Objects[[i]]$err.rate[ntree,1]*100)

    print(paste("finished RF classifier for sample size", sample.sizes[i], "with error rate", RF.ErrRates[i]))
}

RF.Objects_TraitsOnly_BMOU <-RF.Objects
RF.ErrRates_TraitsOnly_BMOU <- RF.ErrRates

setwd("/Users/Megan/Documents/Ruffleyetal2018EcolLetters/Data/PowerAnalysis/")
save(RF.ErrRates, file="RF.ErrRates.Rdata")

Now we can check whether the error rates for random forest decrease with increasing sample size. Let's plot sample size with OOB error rate and see.

plot(sample.sizes, RF.ErrRates, type = "b", lty = 1, lwd = 2, ylab= "OOB Error Rates", xlab = "Sample Size",
     ylim = c(0,40), bty="l", pch=19)

plot(sample.sizes, RF.ErrRates_TraitsOnly_BMOU, type = "b", lty = 1, lwd = 2, ylab= "OOB Error Rates", xlab = "Sample Size",
     ylim = c(0,50), bty="l", pch=19)

We can see in the plot the error rates are decreasing with increasing sample size, which is what we expect to happen. However, the OOB error rates are not as low as we would like to see them.

We can view a RF classifier and see if there is any information in the confusion matrix. This matrix conveys more information than just the OOB rates because not only does it provide the number of simulations classified as the correct model, but it also shows where the incorrectly classified models are being classified.

We can look at the first RF object that has a very low error rate. This RF classifer was constructed with regional communities that had 100 - 150 species.

RF.ModProbs.Table <- matrix(NA, 20, 7)
for (i in 1:20) {
  RF.ModProbs.Table[i,] <- c(RF.Objects[[i]]$err.rate[1000,4], RF.Objects[[i]]$err.rate[1000,3], RF.Objects[[i]]$err.rate[1000,2],
                             RF.Objects[[i]]$err.rate[1000,7], RF.Objects[[i]]$err.rate[1000,6], RF.Objects[[i]]$err.rate[1000,5],
                             RF.Objects[[i]]$err.rate[1000,1])
}
setwd("/Users/Megan/Documents/Ruffleyetal2018EcolLetters/Data/PowerAnalysis/")
write.table(RF.ModProbs.Table*100, sep = "\t", file = "RF.ModProbs.Table.csv")

We can also look at the RF classifiers that used larger datasets and see what the confusion matrix tells us.

RF.Objects[[18]]

RF.Objects[[1]]

This RF classifer was constructed with regional communities that had 950 - 999 species, and we can see that models that are incorrectly classified are most commonly classified as the wrong model of trait evolution, while the communtiy assembly model is still correct. This could be because depending on the rate of trait evolution, distinguishing between BM and OU, after the assembly process has been completed, may be quite difficult.

For the last inspection of power for Random Forest we are going to try to just classify models by their community assmebly process only, and not try to classify the model of trait evolution used for simulation. Now, rather than comparing six models, we will only be comparing three. Then we can inspect, again, whether error rates decrease with increasing sample size.

#create an empty vector to store error rates of each classifier
RF.Objects_2 <- list()
RF.ErrRates_2 <- c()

for (i in 1:length(sample.sizes)){

  #get all summary stats from each of the models simulated for this sample size (we simulated 6 models)
  all.sum.stats <- c()
  for (k in 1:length(SimOutput[[i]])){
    all.sum.stats <- rbind(all.sum.stats, SimOutput[[i]][[k]]$summary.stats)
  }

  #Correspond each simulation to the model it was simulated under using a categorical variable. 
  #This is the response variable in the random forest analysis

  #I just removed the "BM" and "OU" marks before each of these categorical variables
  Model.Index <- rep(rep(c("neut", "filt", "comp", "neut", "filt", "comp"), each=nrow(all.sum.stats)/length(SimOutput[[1]])))
  Ref.Table <- na.omit(data.frame(all.sum.stats[, c(3,4,15, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29, 30)], Model.Index))

  #set number of trees here
  ntree=1000

  #build random forest classifier and record the error rate
    RF.Objects_2[[i]] <- randomForest(Model.Index ~., data=Ref.Table, ntree=ntree, importance=T)
    RF.ErrRates_2 <- c(RF.ErrRates_2, RF.Objects_2[[i]]$err.rate[ntree,1]*100)

    print(paste("finished RF classifier for sample size", sample.sizes[i], "with error rate", RF.ErrRates_2[i]))
}

RF.Objects_TraitsOnly_OU <-RF.Objects_2
RF.ErrRates_TraitsOnly_OU <- RF.ErrRates_2

getwd()
setwd("/Users/Megan/Documents/GitHub/CAMI/Data/PowerAnalysis/")
load(file="RF.ErrRates.Rdata")
#save(RF.ErrRates_2, file="RF.ErrRates_2.Rdata")
load(file="RF.ErrRates_2.Rdata")

We can compare the error rates between the two analyses. The error rates are much lower when we are only distinguishing between the three community assembly models. When we try to distinguish between community assembly model AND the model of trait evolution, the error rates are much higher.

RF.ModProbs.Table2 <- matrix(NA, 20, 4)
for (i in 1:20) {
  RF.ModProbs.Table2[i,] <- c(RF.Objects_2[[i]]$err.rate[1000,4], RF.Objects_2[[i]]$err.rate[1000,3], RF.Objects_2[[i]]$err.rate[1000,2],
                              RF.Objects_2[[i]]$err.rate[1000,1])
}
setwd("/Users/Megan/Documents/Ruffleyetal2018EcolLetters/Data/PowerAnalysis/")
write.table(RF.ModProbs.Table2*100, sep = "\t", file = "RF.ModProbs.Table2.csv") 

plot(sample.sizes, RF.ErrRates, type = "b", lty = 1, pch=19, lwd = 2, ylab= "OOB Error Rates (%)", xlab = "Sample Size",
     ylim = c(0,40), bty="l")
points(sample.sizes, RF.ErrRates_2, type = "b", pch=15, lty = 3, lwd = 2)
legend("topright", legend = c("RF.T+C", "RF.C"), lty = c(1,3), pch = c(19,15), bty="n", lwd = c(3,3), cex = 1)

Approximate Bayesian Computation

Before we can perform a power analysis using ABC, we need to determine which summary statistics to use. RandomForest is able to handle all 30 of the statistucs, but ABC suffers more severaly from the curse of dimensionality, and will not perform well with too many summary statistics. We can look at the variable importance plots from our RF objects (one with low error rates) and determine which summary statistics are most useful in those classifications.

varImpPlot(RF.Objects_2[[18]])

We will use the top 10 most important summary statistics for our ABC analysis.

Model Selection

We will use the same simulated data as used in the RF power analysis to perform cross validation simulations using ABC. For this, we will take the 500 simulations for each model, for each sample size, and perform ABC on each of those datasets to measure how often ABC detremines the correct model. The "abc" R package has a built in function to perform this cross validations.

ABC.Objects <- list()
ABC.Power <- matrix(NA, length(sample.sizes), 3)
colnames(ABC.Power) <- c("ErrorRate", "MPP", "MPPwhenTrue")
er <- matrix(NA, 20,6)

for (i in 18:length(sample.sizes)){

  #get all summary stats from each of the models simulated for this sample size (we simulated 6 models)
  all.sum.stats <- c()
  for (k in 1:length(SimOutput[[1]])){
    all.sum.stats <- rbind(all.sum.stats, SimOutput[[i]][[k]]$summary.stats)
  }

  #pull out top 10 summary statistics as determined by RF
  all.sum.stats <- all.sum.stats[,c(4, 9, 10, 12, 13, 16, 18, 19, 20, 27)]

  #Correspond each simulation to the model it was simulated under using a categorical variable. 
  Model.Index <- rep(rep(c("BM.neut", "BM.filt", "BM.comp", "OU.neut", "OU.filt", "OU.comp"), each=nrow(all.sum.stats)/length(SimOutput[[1]])))

  #run cross validation analysis
  nval=500
  cv <- cv4postpr(Model.Index, all.sum.stats, nval=nval, tols=.01, method="rejection") 
  ABC.Objects[[i]] <- cv

  #Extract info from each run about accuracy of classifying each model
  model.names <- c("BM.comp", "BM.filt", "BM.neut", "OU.comp", "OU.filt",  "OU.neut")
  tmp.matrix <- matrix(NA, length(model.names), 3)

    for (j in 1:6){
        mat <- cv$model.probs$tol0.005[rownames(cv$model.probs$tol0.005)==model.names[j],j]
        est <- cv$estim$tol0.005[names(cv$estim$tol0.005)==model.names[j]] 
        tmp.matrix[j,1] <- (sum(est!=model.names[j])/nval)*100 #turn into the error rate %
        tmp.matrix[j,2] <- mean(mat)
        tmp.matrix[j,3] <- mean(mat[est==model.names[j]])
    }

  er[i,] <- c(tmp.matrix[3,1], tmp.matrix[2,1], tmp.matrix[1,1], tmp.matrix[6,1], tmp.matrix[5,1], tmp.matrix[4,1])
  ABC.Power[i,] <- apply(tmp.matrix, 2, mean)
    print(paste("finished ABC cross validation for sample size", sample.sizes[i], "with error rate", ABC.Power[i,1]))
}

setwd("/Users/Megan/Documents/GitHub/CAMI/Data/PowerAnalysis/")
#save(ABC.Power, file="ABC.Power.Rdata")
#write.table(er, file="ABCtable_ErrRates.csv",sep ="\t")

We can plot these ABC error rates with samples size.

plot(sample.sizes, ABC.Power[,1], type = "b", lty = 1, lwd = 2, ylab= "OOB Error Rates", xlab = "Sample Size",
     ylim = c(0,50), bty="l", pch=19, col="gray60")

As in the randomForest analysis, we can also see if the model classification improve when we are only classifying the models of community assembly, and not also trying to classify the model of trait evolution. For this, we only need some slight modifications to the code.

ABC.Objects_2 <- list()
ABC.Power_2 <- matrix(NA, length(sample.sizes), 3)
colnames(ABC.Power_2) <- c("ErrorRate", "MPP", "MPPwhenTrue")
er_2 <- matrix(NA, 20,6)

for (i in 2:length(sample.sizes)){

  #get all summary stats from each of the models simulated for this sample size (we simulated 6 models)
  all.sum.stats <- data.frame()
  for (k in 1:length(SimOutput[[1]])){
    all.sum.stats <- rbind(all.sum.stats, SimOutput[[i]][[k]]$summary.stats)
  }
    all.sum.stats <- data.frame(all.sum.stats)
  #pull out top 10 summary statistics as determined by RF
  all.sum.stats <- all.sum.stats[,c(4, 9, 10, 12, 13, 16, 18, 19, 20, 27)]

  #Correspond each simulation to the model it was simulated under using a categorical variable. 
  Model.Index <- rep(rep(c("neut", "filt", "comp", "neut", "filt", "comp"), each=nrow(all.sum.stats)/length(SimOutput[[1]])))
  #Model.Index <- rep(rep(c("BM.neut", "BM.filt", "BM.comp", "OU.neut", "OU.filt", "OU.comp"), each=nrow(all.sum.stats)/length(SimOutput[[1]])))

  #run cross validation analysis
  nval=500
  cv <- cv4postpr(Model.Index, all.sum.stats, nval=nval, tols=.01, method="rejection") 
  ABC.Objects_2[[i]] <- cv

  #Extract info from each run about accuracy of classifying each model
  model.names <- c("comp", "filt", "neut")
  #model.names <- c("BM.comp", "BM.filt", "BM.neut", "OU.comp", "OU.filt",  "OU.neut")
  tmp.matrix <- matrix(NA, length(model.names), 3)

    for (j in 1:length(model.names)){
        mat <- cv$model.probs$tol0.005[rownames(cv$model.probs$tol0.005)==model.names[j],j]
        est <- cv$estim$tol0.005[names(cv$estim$tol0.005)==model.names[j]] 
        tmp.matrix[j,1] <- (sum(est!=model.names[j])/nval)*100 #turn into the error rate %
        tmp.matrix[j,2] <- mean(mat)
        tmp.matrix[j,3] <- mean(mat[est==model.names[j]])
    }

  print(tmp.matrix)
  er_2[i,] <- c(tmp.matrix[3,1], tmp.matrix[2,1], tmp.matrix[1,1])
  ABC.Power_2[i,] <- apply(tmp.matrix, 2, mean)
    print(paste("finished ABC cross validation for sample size", sample.sizes[i], "with error rate", ABC.Power_2[i,1]))
}

setwd("/Users/Megan/Documents/Github/CAMI/Data/PowerAnalysis/")
save(ABC.Power_2, file="ABC.Power_2.Rdata")
write.table(er_2[,1:3], file="ABCtable_ErrRates2.csv",sep ="\t")

We can plot these ABC results along with the RF results

setwd("/Users/Megan/Documents/Github/CAMI/data/PowerAnalysis/")
load(file="RF.ErrRates.Rdata")
load(file="RF.ErrRates_2.Rdata")
load(file="ABC.Power.Rdata")
load(file="ABC.Power_2.Rdata")

##Red and Black
plot(sample.sizes, RF.ErrRates, type = "b", lty = 1, pch=19, lwd = 2, ylab= "OOB Error Rates (%)", xlab = "Sample Size",
     ylim = c(0,50), bty="l")
points(sample.sizes, RF.ErrRates_2, type = "b", pch=15, lty = 1, lwd = 2)
points(sample.sizes, ABC.Power[,1], type = "b", pch=19, lty = 1, lwd = 2, col = "red")
points(sample.sizes, ABC.Power_2[,1], type = "b", pch=15, lty = 1, lwd = 2, col = "red")
legend("topright", legend = c("RF.T+C", "RF.C", "ABC.T+C", "ABC.C"), lty = c(1,1,1,1), pch = c(19,15,19,15), bty="n", 
       lwd = c(3,3,3,3), cex = 1, col = c("black", "black", "red", "red"))

##Distiguishing btw all 6 models (+ trait evolution)
plot(sample.sizes, RF.ErrRates, type = "b", lty = 1, pch=19, lwd = 2, ylab= "Error Rate (%)", xlab = "Community Sample Size",
     ylim = c(0,80), bty="l", col="black")
points(sample.sizes, ABC.Power[,1], type = "b", pch=15, lty = 1, lwd = 2, col = "black")
legend("topright", legend = c("RF", "ABC"), lty = c(1,1,1,1), pch = c(19,15,19,15), bty="n", 
       lwd = c(3,3,3,3), cex = 1, col = "black")

#Distinguishing just between community assembly models
plot(sample.sizes, RF.ErrRates_2, type = "b", lty = 1, pch=19, lwd = 2, ylab= "OOB Error Rates (%)", xlab = "Sample Size",
     ylim = c(0,30), bty="l", col=rgb(.4, .7, .7, 1))
points(sample.sizes, ABC.Power_2[,1], type = "b", pch=15, lty = 1, lwd = 2, col = "gray33")
legend("topright", legend = c("RF", "ABC"), lty = c(1,1,1,1), pch = c(19,15,19,15), bty="n", 
       lwd = c(3,3,3,3), cex = 1, col = c(rgb(.4, .7, .7, 1), "gray33"))

Phylogenetic Dispersion Metrics

To meausure phylogenetic dispersion we will calculate two metircs, Mean Pairwise Distance (MPD), and Mean Nearest-Neighbot Distance (MNTD), using the picante package (Kembel et al. 2010). For each simulation we can use the function ses.mpd and ses.mpd (also from the picante package) to compare observed measures of mpd and mntd to expectations under a null model of phylogenetic community structure.

We can use the same function as we used before to simulate community aseembly data, only this time, instead of outputing the summary statistics, we can output the phylogenetic dispersion metrics, along with the output of the ses.mpd and ses.mntd functions. To do this we can change the boolean parameters of the function output.sum.stats to FALSE and output.phydisp.stats to TRUE.

#This script ran with 1000 sims per 18 sample sizes will ran for approximately 48 hours on UI server
sample.sizes <- seq(50,1000,50)
sims <- 1000
SimOutput_PhyDisp <- list()

for (i in 1:length(sample.sizes)) {

  #BM models
  BM.neutral <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "neutral", output.sum.stats = FALSE, output.phydisp.stats = TRUE)
  BM.filtering <- SimCommunityAssembly(sims = sims, N = sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "filtering", tau = c(1,60), output.sum.stats = FALSE, output.phydisp.stats = TRUE)
  BM.competition <- SimCommunityAssembly(sims = sims, N = sample.sizes[i], local = 0.5, traitsim = "BM", comsim = "competition", tau = c(1,60), output.sum.stats = FALSE, output.phydisp.stats = TRUE)

  #OU models
  OU.neutral <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "neutral", output.sum.stats = FALSE, output.phydisp.stats = TRUE)
  OU.filtering <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "filtering",  tau = c(1,60), output.sum.stats = FALSE, output.phydisp.stats = TRUE)
  OU.competition <- SimCommunityAssembly(sims = sims, N =  sample.sizes[i], local = 0.5, traitsim = "OU", comsim = "competition",  tau = c(1,60), output.sum.stats = FALSE, output.phydisp.stats = TRUE)

  SimOutput_PhyDisp[[i]] <- list(BM.neutral, BM.filtering, BM.competition, OU.neutral, OU.filtering, OU.competition)
  print(paste("completed simulations of sample size", sample.sizes[i]))

}

save(SimOutput_PhyDisp, file="SimOutput_PhyDisp.Rdata")

After the data are simulated, we can asses how accurate the results are. For this, if either test statistic (mpd or mntd) is in the lower 2.5% of the null distribution, habitat filtering is infered because the data are more cluster than would be expected by chance. If either test statistic is in the upper 2.5%, competitive exclusion is assumed becuase the data are significantly over-dispersed than would be expected by chance. We will infer these models using the phylogenetic data and the trait data.

We will calculate power in much the same way as we did with RF and ABC, by caclulating how often MPD and MNTD can correctly classify the data we simualted under the various models.

setwd("/Users/Megan/Documents/Github/CAMI/data/PhyDispData/")
load(file="SimOutput_PhyDisp.Rdata")

PhyDispError <- matrix(NA, length(SimOutput_PhyDisp), 12)
colnames(PhyDispError) <- c("Phy.mpd.neut", "Phy.mpd.filt", "Phy.mpd.comp","Phy.mntd.neut", "Phy.mntd.filt", "Phy.mntd.comp",
                            "Tr.mpd.neut", "Tr.mpd.filt", "Tr.mpd.compt", "Tr.mntd.neut", "Tr.mntd.filt", "Tr.mntd.compt")
sims=1000

for (i in 1:length(SimOutput_PhyDisp)) {

  for (k in 1:6){
    if (k == 1 || k == 4){
      PhyDispError[i, 1] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,7] > 0.025  & SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,7] < 0.975) / sims)
      PhyDispError[i, 4] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,15] > 0.025  & SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,15] < 0.975) / sims)
      PhyDispError[i, 7] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,23] > 0.025  & SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,23] < 0.975) / sims)
      PhyDispError[i, 10] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,31] > 0.025  & SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,31] < 0.975) / sims)
    }
    if (k == 2 || k == 5){
      PhyDispError[i, 2] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,7] < 0.025) / sims)
      PhyDispError[i, 5] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,15] < 0.025) / sims)
      PhyDispError[i, 8] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,23] < 0.025) / sims)
      PhyDispError[i, 11] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,31] < 0.025) / sims)
    }
    if (k == 3 || k == 6){
      PhyDispError[i, 3] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,7] > 0.975) / sims)
      PhyDispError[i, 6] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,15] > 0.975) / sims)
      PhyDispError[i, 9] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,23] > 0.975) / sims)
      PhyDispError[i, 12] <- 1 - (sum( SimOutput_PhyDisp[[i]][[k]]$phydisp.stats[,31] > 0.975) / sims)
    }

  }
}

setwd("/Users/Megan/Documents/Github/CAMI/data/PhyDispData/")
load(file="PhyDispError.Rdata")

##plot the error rates
phy.error <- c()
tr.error <- c()
for (i in 1:nrow(PhyDispError)){
  phy.error <- c(phy.error, mean(PhyDispError[i, 1:6]))
  tr.error <- c(tr.error, mean(PhyDispError[i, 7:12]))
}

Now we can plot all of our results together and compare them

##Color with gray background
par(bg="gray93")
plot(sample.sizes/2, phy.error, type = "b", lty = 1, pch=15, lwd = 2.1, ylab= "Error Rate", xlab = "Community Sample Size",
     ylim = c(0,1), bty="l", col="#e66101", cex=1.2)
points(sample.sizes/2, tr.error, type = "b", pch=15, lty = 1, lwd = 2.1, col = "#fdb863", cex=1.2)
points(sample.sizes/2, ABC.Power_2[,1]/100, type = "b", pch=16, lty = 1, lwd = 2.1, col = "#b2abd2", cex=1.2)
points(sample.sizes/2, RF.ErrRates_2[]/100, type = "b", pch=16, lty = 1, lwd = 2.1, col = "#5e3c99", cex=1.2)
legend("top", legend = c("Disp.Phy", "Disp.Traits", "ABC", "RF"), lty = c(1,1,1,1), pch = c(15,15,16,16), bty="n", 
       lwd = c(3,3,3,3), cex = 1.1, col = c("#e66101", "#fdb863", "#b2abd2", "#5e3c99"), ncol=2)

##Black and White with symbols
plot(sample.sizes/2, phy.error, type = "b", lty = 1, pch=22, lwd = 1.5, ylab= "Error Rate", xlab = "Community Sample Size",
     ylim = c(0,1), bty="l", col="black", bg="gray95", cex=1.4)
axis(2, ylim=c(0,2))
points(sample.sizes/2, tr.error, type = "b", pch=23, lty = 1, lwd = 1.5,col="black", bg = "gray85", cex=1.4)
points(sample.sizes/2, ABC.Power_2[,1]/100, type = "b", pch=24, lty = 1, lwd = 1.5, col = "black", bg="gray55", cex=1.1)
points(sample.sizes/2, RF.ErrRates_2/100, type = "b", pch=21, lty = 1, lwd = 1.5, col = "black", bg="gray35", cex=1.2)
legend("topright", legend = c("Disp.Phy", "Disp.Traits", "ABC", "RF"), lty = c(1,1,1,1), pch = c(22,23,24,21), bty="n", 
       lwd = c(1.5, 1.5, 1.5, 1.5), pt.cex = c(1.4, 1.4, 1.1, 1.2), col = "black", pt.bg=c("gray95", "gray85", "gray55", "gray35"), ncol=1)

#black and white no symbols
lwd=3
#tiff('Fig2.tiff', units="in", width=6, height=4.5, res=2000, compression = 'lzw')
pdf(file="Fig2_power.pdf", width=6, height=4.5)
plot(sample.sizes/2, phy.error, type = "l", lty = 3, pch=22, lwd =lwd, ylab= "Error Rate", xlab = "Community Size", ylim = c(0,1), bty="l", col="black", cex.lab=1)
points(sample.sizes/2, tr.error, type = "l", pch=23, lty = 1, lwd = lwd,col="gray40")
points(sample.sizes/2, ABC.Power_2[,1]/100, type = "l", pch=24, lty = 1, lwd = lwd, col = "gray75")
points(sample.sizes/2, RF.ErrRates_2/100, type = "l", pch=21, lty = 2, lwd = lwd, col = "gray20")
legend("topright", legend = c("Disp.Phy", "Disp.Traits", "ABC", "RF"), lty = c(3,1,1,2), bty="n", 
       lwd =rep(3,4), col = c("black", "gray45", "gray75", "gray20") ,seg.len=7, ncol=1)

ruffleymr/CAMI documentation built on Aug. 21, 2019, 2:55 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ruffleymr/CAMI
Identify models of community assembly using ABC and randomForest algorithms

In ruffleymr/CAMI: Identify models of community assembly using ABC and randomForest algorithms

Community Assembly Model Inference

Simulation of Data

Random Forest

Approximate Bayesian Computation

Model Selection

Phylogenetic Dispersion Metrics

R Package Documentation

Browse R Packages

We want your feedback!

ruffleymr/CAMI Identify models of community assembly using ABC and randomForest algorithms

In ruffleymr/CAMI: Identify models of community assembly using ABC and randomForest algorithms

Community Assembly Model Inference

Simulation of Data

Random Forest

Approximate Bayesian Computation

Model Selection

Phylogenetic Dispersion Metrics

R Package Documentation

Browse R Packages

We want your feedback!

ruffleymr/CAMI
Identify models of community assembly using ABC and randomForest algorithms