In walmes/RDASC: Reproducible Data Analysis of Cientific Cooperations

Session definitions

export_tiff <- TRUE

library(knitr)
opts_chunk$set(dpi = 100,
               dev = "png",
               warning = FALSE,
               error = FALSE,
               message = FALSE,
               dev.args = list(family = "Roboto Condensed"))
# options(width = 68)

# rm(list = objects())

# List of packages.
pkg <- c("tidyverse", "agricolae", "emmeans", "lme4", "lmerTest")

# Loads each package.
sapply(pkg, FUN = library, character.only = TRUE, logical.return = TRUE)

devtools::load_all()

Area under the micelial growth curve

Exploratory data analysis

# Prepare table of data.
ogt <- as_tibble(RDASC::pistachio_anthracnose$ogrotem)
ogt$exp <- factor(ogt$exp)
attr(ogt, "spec") <- NULL
str(ogt)

# Get the mean diameter (4 mm is the plug size).
ogt$diameter <- (ogt$mm1 + ogt$mm2)/2 - 4

# with(ogt, c(min(mm1, na.rm = TRUE),
#             min(mm2, na.rm = TRUE),
#             min(diameter, na.rm = TRUE)))

# Curves for each isolate according to experimental conditions.
ggplot(data = ogt,
       mapping = aes(x = day,
                     y = diameter,
                     color = iso)) +
    facet_grid(facets = exp ~ tem) +
    geom_point() +
    stat_summary(geom = "line", fun.y = "mean") +
    labs(x = "Days",
         y = "Diameter (mm)",
         color = "Isolate")

# Loads a function to calculate area under a curve by trapezoidal
# method.
# source("https://raw.githubusercontent.com/walmes/wzRfun/master/R/auc.R")
# args(auc)
# body(auc)

# Frequency tables.
xtabs(~iso + spp, data = ogt)
ftable(xtabs(~exp + tem + iso, data = ogt))

# AUC by each experimental unit = exp > spp > iso > tem > rep.
aumgc <- ogt %>%
    group_by(exp, spp, iso, tem, rep) %>%
    summarise(auc = auc(time = day, resp = diameter)) %>%
    ungroup()
aumgc

# Convert come experimental controled variables to factor.
aumgc <- aumgc %>%
    mutate_at(vars(exp, spp, iso, rep), factor)
str(aumgc)

# Visualize.
ggplot(data = aumgc,
       mapping = aes(x = tem, y = auc, color = exp)) +
    facet_wrap(facets = ~iso) +
    geom_point() +
    stat_summary(geom = "line", fun.y = "mean") +
    labs(x = "Temperature",
         y = "AUC",
         color = "Experiment")

# Removes outlier.
aumgc[with(aumgc,
           iso == "12D46" &
           exp == 3 &
           tem == 25 &
           auc < 50), ]$auc <- NA

Optimal temperature for each isolate

exp: represents the contour conditions of each experiment, so it represents the blocking factor.
iso: is the effect of isolate.
rep: is each independent replication of experiment $\times$ isolate $\times$ temperature. Response could be averaged across replications.

# Averages the AUC.
aumgcm <- aumgc %>%
    group_by(exp, iso, tem) %>%
    summarise(aucm = mean(auc, na.rm = TRUE)) %>%
    ungroup()

# Visualize.
ggplot(data = aumgcm,
       mapping = aes(x = tem, y = aucm, color = exp)) +
    facet_wrap(facets = ~iso) +
    geom_point() +
    geom_line() +
    labs(x = "Temperature",
         y = "AUC",
         color = "Experiment")

# Uses a "naive" model (in terms of variance components) to check the
# model assumptions. All factores here are qualitative.
m0 <- lm(aucm ~ exp + iso * factor(tem),
         data = aumgcm)

# Residual plots.
par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

# NOTE: no strong evidence against model assumptions.

By the above model, the growth pattern does rely on main effects. Interactions between isolate and temperature is not relevant at 5% significance level. Despite the above model can be useful, the greater interest is in comparing optimal growth temperatures. In the next section, the optimal temperature will be estimated for each experimental unit and then these data will be submited to the analysis of variance.

Optimal temperature estimation

Estimation of optimal temperature is done for each experimental condition: experiment $\times$ isolate. After estimation, a model for hyphotesis on equality of the optimal values will be employed.

# # Each experimental unit.
# gg <- ggplot(data = aumgc,
#              mapping = aes(x = tem, y = auc, color = rep, group = rep)) +
#     facet_grid(facets = exp ~ iso) +
#     geom_point() +
#     expand_limits(y = c(0, NA)) +
#     geom_hline(yintercept = 25, linetype = 3) +
#     labs(x = "Temperature",
#          y = "AUC")

# Each experimental cell.
gg <- ggplot(data = aumgcm,
       mapping = aes(x = tem, y = aucm)) +
    facet_grid(facets = exp ~ iso) +
    geom_point() +
    expand_limits(y = c(0, NA)) +
    geom_hline(yintercept = 25, linetype = 3) +
    labs(x = "Temperature",
         y = "AUC")

gg +
    geom_smooth(method = "lm",
                formula = y ~ poly(x, degree = 4),
                se = FALSE)


# ATTENTION FIXME: the values of diameter must be subtracted from the
# plug initial diameter? Is 25 mm?

A 4 degree polynomial will be used to fit the AUC as a function of temperature. After fitting, the optimal temperature will be determined by a numerical optmization method.

#-----------------------------------------------------------------------
# Polynomial approach.

# Function that fits a model equation to data and determine the optimal
# temperature.
fit_poly <- function(data,
                     formula = auc ~ poly(tem, degree = 4),
                     interval = c(10, 35),
                     maximum = TRUE) {
    m0 <- lm(formula = formula, data = data)
    pred <- function(x) predict(m0, newdata = list(tem = x))
    t_opt <- optimise(f = pred,
                      interval = interval,
                      maximum = maximum)
    return(data.frame(t_opt = t_opt$maximum,
                      sqr = deviance(m0),
                      ll = logLik(m0)))
}

# # Testing.
# fit_poly(data = aumgcm[with(aumgcm, iso == "11J23" & exp == "1"),
#                        c("aucm", "tem" )])

#-----------------------------------------------------------------------
# Optimal temperature.

# Fits for each experimental cell.
temps <- aumgcm %>%
    group_by(iso, exp) %>%
    summarise(topt_p = list(
                  fit_poly(data = data.frame(aucm, tem),
                           formula = aucm ~ poly(tem, degree = 4)))) %>%
    ungroup() %>%
    unnest(topt_p)

# # Fits for each experimental unit.
# temps <- aumgc %>%
#     group_by(iso, exp, rep) %>%
#     summarise(topt_p = list(
#                   fit_poly(data = data.frame(auc, tem),
#                            formula = auc ~ poly(tem, degree = 4)))) %>%
#     ungroup() %>%
#     unnest()

# See the results for the polynomial approach.
gg +
    geom_smooth(method = "lm",
                formula = y ~ poly(x, degree = 4),
                se = FALSE,
                color = "purple",
                size = 0.6) +
    geom_vline(data = temps,
               mapping = aes(xintercept = t_opt),
               color = "purple",
               linetype = 2)

Isolate comparison

# Filter by the chosen approach.
da <- temps %>%
    select(iso, exp, t_opt)

# Fit the model according to the experiment layout.
m0 <- lm(t_opt ~ exp + iso, data = da)

# Residual plots.
par(mfrow = c(2, 2))
plot(m0)
layout(1)

# ANOVA table.
anova(m0)

# Tukey HSD test on isolate means.
tk <- HSD.test(m0, trt = "iso", group = TRUE, console = FALSE)
tk <- tk$groups %>%
    rownames_to_column() %>%
    rename("iso" = "rowname")
tk

# Least squares means.
lsm <- emmeans(m0, specs = "iso") %>%
    as.data.frame() %>%
    mutate(iso = as.character(iso))

# Merge these two tables.
tb_means <- full_join(x = lsm, y = tk, by = "iso")
tb_means

# Segment plot displaying means with confidence interval.
ggplot(data = tb_means,
       mapping = aes(y = fct_reorder(iso, emmean), x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.1f%s", emmean, groups)),
              vjust = 0, nudge_y = 0.10) +
    labs(y = "Isolate",
         x = expression("Temperature" ~ (degree * C)))

Pairwise comparisons were made using Tukey HSD test at a 5% nominal significance level.

Sporulation capability

Exploratory data analysis

# Prepare table of data.
spo <- as_tibble(RDASC::pistachio_anthracnose$spo_vt)
attr(spo, "spec") <- NULL

# Convert variables to factor.
spo$exp <- factor(spo$exp)
spo$spp <- factor(spo$spp,
                  levels = c("Cf", "Ck"),
                  labels = c("C. fioriniae", "C. karstii"))
str(spo)

# Summaries.
summary(spo)

# Table frequency.
xtabs(~spp + iso, data = spo)
xtabs(~iso + cv, data = spo)

# Creating the context or offset variable for the count (spo).
spo$offset <- with(spo, slice *
                        c(16, 4)[match(index, c(160000, 10000))])

# Creates the variable concentration.
# Numbers correspond to 1000 in slide and 16 in cells.
spo$con <- with(spo, 1000 * 16 * spo/offset)

ggplot(data = spo,
       mapping = aes(x = tem, y = con, color = iso)) +
    facet_grid(facets = exp ~ cv) +
    geom_jitter(width = 0.5) +
    stat_summary(geom = "line", fun.y = "mean") +
    scale_y_log10()

Model fitting

Data were transform to log scale to avoid violations in model assumptions. These data could be analysed assuming a probability distribution for count data. Here we decided to use a Gaussian model to the transformed response variable. This results should not differ from the count data model.

# Convert from numerical to factor.
spo$tem <- factor(spo$tem)

# Fit the model.
# m0 <- lm(con ~ exp + iso * cv * tem, data = spo)
# MASS::boxcox(m0)
m0 <- lm(log(con) ~ exp + iso * cv * tem, data = spo)

# Check for any violation.
par(mfrow = c(2, 2))
plot(m0)
layout(1)

# Anova table.
anova(m0)

# Update model dropping no relevant experimental terms.
m1 <- update(m0, . ~ exp + iso)
anova(m1, m0)

# Gets the estimated marginal means for isolates.
emm <- emmeans(m1, specs = ~iso)
emm %>% as.data.frame()

# Values in the response scale.
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate_at(c("emmean", "lower.CL", "upper.CL"), exp) %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

# Results.
ggplot(data = tb_means,
       mapping = aes(y = iso, x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0,
              nudge_y = 0.05,
              show.legend = FALSE) +
    labs(x = "Number of spores",
         # x = expression(Log[10] ~ "of sporulation"),
         y = "Isolates")

Morphological conidia characterization

Exploratory data analysis

# Import the data.
cs <- as_tibble(RDASC::pistachio_anthracnose$cs)
cs$exp <- factor(cs$exp)
cs$spp <- factor(cs$spp,
                 levels = c("Cf", "Ck"),
                 labels = c("C. fioriniae", "C. karstii"))
attr(cs, "spec") <- NULL
str(cs)

# Summaries.
summary(cs)

# Curves for each isolate according to experimental conditions.
ggplot(data = cs,
       mapping = aes(x = len,
                     y = wid,
                     color = kare)) +
    facet_wrap(facets = ~exp) +
    geom_point()

# Find the convex hull of the points being plotted.
cs_hull <- cs %>%
    group_by(exp, kare) %>%
    slice(chull(len, wid))

# Show the convex hull for each isolate.
ggplot(data = cs,
       mapping = aes(x = len,
                     y = wid,
                     color = kare)) +
    facet_wrap(facets = ~exp) +
    geom_polygon(data = cs_hull, alpha = 0.1) +
    geom_point() +
    labs(x = expression("Conidia length" ~ (mu * m)),
         y = expression("Conidia width" ~ (mu * m)))

# ggplot(data = cs,
#        mapping = aes(x = len,
#                      color = kare)) +
#     facet_wrap(facets = ~exp) +
#     geom_density()
#
# ggplot(data = cs,
#        mapping = aes(x = wid,
#                      color = kare)) +
#     facet_wrap(facets = ~exp) +
#     geom_density()
#
# ggplot(data = cs,
#        mapping = aes(x = vol,
#                      color = kare)) +
#     facet_wrap(facets = ~exp) +
#     geom_density()

cs %>%
    gather(key = "variable", value = "value", len, wid, vol) %>%
    group_by(variable) %>%
    mutate(value = scale(value)) %>%
    ggplot(mapping = aes(x = value,
                         color = kare)) +
    facet_grid(facets = exp ~ variable, scales = "free") +
    geom_density() +
    geom_rug()

# Empirical cummulative density function.
cs %>%
    gather(key = "variable", value = "value", len, wid, vol) %>%
    group_by(variable) %>%
    mutate(value = scale(value)) %>%
    ggplot(mapping = aes(x = value,
                         color = kare)) +
    facet_grid(facets = exp ~ variable, scales = "free") +
    stat_ecdf() +
    geom_rug()

Conidia length

# Analysis done at experimental unit since individual values are samples
# from the same experimental unit (replicates and not repetitions).
csm <- cs %>%
    group_by(exp, kare) %>%
    summarise(n = n(),
              len = mean(len),
              wid = mean(wid),
              vol = mean(vol))
csm

# To keep the estimated means.
tb_results <- list()

# Models that uses the number of replicates as weighting variable.
m0 <- lm(len ~ exp + kare, data = csm, weight = n)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

# DANGER: Tukey test is based on geometric mean of the number of
# replicates. Since the number of replications are very enequal, this
# aproximation is not suitable.
# tt <- HSD.test(m0, "kare", console = FALSE)
# tt$groups

emm <- emmeans(m0, specs = ~kare)
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

tb_results[["len"]] <- tb_means

# Segment plot displaying means with confidence interval.
gg_len <- ggplot(data = tb_means,
       # mapping = aes(y = fct_reorder(kare, emmean), x = emmean)) +
       mapping = aes(y = kare, x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0, nudge_y = 0.10) +
    labs(x = expression("Conidia length" ~ (mu * m)),
         y = "Isolates", title = "(A)")

Conidia width

# Models that uses the number of replicates as weighting variable.
m0 <- lm(wid ~ exp + kare, data = csm, weight = n)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

emm <- emmeans(m0, specs = ~kare)
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

tb_results[["wid"]] <- tb_means

# Segment plot displaying means with confidence interval.
gg_wid <- ggplot(data = tb_means,
       # mapping = aes(y = fct_reorder(kare, emmean), x = emmean)) +
       mapping = aes(y = kare, x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0, nudge_y = 0.10) +
    labs(x = expression("Conidia width" ~ (mu * m)),
         y = "Isolates", title = "(B)")

Conidia volume

# Models that uses the number of replicates as weighting variable.
m0 <- lm(vol^(1/3) ~ exp + kare, data = csm, weight = n)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

emm <- emmeans(m0, specs = ~kare)
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

tb_results[["vol"]] <- tb_means

# Segment plot displaying means with confidence interval.
gg_vol <- ggplot(data = tb_means,
       # mapping = aes(y = fct_reorder(kare, emmean), x = emmean)) +
       mapping = aes(y = kare, x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0, nudge_y = 0.10) +
    labs(x = expression("Cubic root of conidia volume" ~ (mu * m^3)),
         y = "Isolates", title = "(C)")

# All morphological characteristics on the same plot.
gridExtra::grid.arrange(gg_len, gg_wid, gg_vol, ncol = 1)

Germination

Exploratory data analysis

# Imports data.
germ <- as_tibble(RDASC::pistachio_anthracnose$ogertem)
germ$exp <- factor(germ$exp)
germ$tim <- factor(germ$tim)
germ$spp <- factor(germ$spp,
                   levels = c("Cf", "Ck"),
                   labels = c("C. fioriniae", "C. karstii"))
attr(germ, "spec") <- NULL
str(germ)

# Summaries.
summary(germ)

# # Curves for each isolate according to experimental conditions.
# ggplot(data = germ,
#        mapping = aes(x = tem,
#                      y = ger/of,
#                      color = tim)) +
#     facet_grid(facets = exp ~ iso) +
#     geom_point() +
#     stat_summary(geom = "line", fun.y = "mean")

# Curves for each isolate according to experimental conditions.
ggplot(data = filter(germ, tem > 10),
       mapping = aes(x = tem,
                     y = ger/of,
                     color = tim)) +
    facet_grid(facets = exp ~ iso) +
    geom_jitter(width = 1, height = 0) +
    # stat_summary(geom = "line", fun.y = "mean") +
    geom_smooth(method = "glm",
                formula = y ~ poly(x, degree = 3),
                # formula = y ~ splines::bs(x, df = 2, degree = 3),
                # formula = y ~ splines::ns(x, df = 3),
                se = FALSE,
                method.args = list(family = "quasibinomial"))

Fit curves for each experimental point

# Function that fits a model equation to data and determine the optimal
# temperature.
fit_poly <- function(data) {
    m0 <- glm(formula = cbind(ger, of - ger) ~ poly(tem, degree = 3),
              data = data,
              family = quasibinomial)
    pval <- tail(summary(m0)$coeff[, "Pr(>|t|)"], n = 1)
    if (pval > 0.05) {
        update(m0, . ~ poly(tem, degree = 2))
    }
    pred <- function(x) predict(m0,
                                newdata = list(tem = x),
                                type = "response")
    t_opt <- optimise(f = pred,
                      interval = c(10, 35),
                      maximum = TRUE)
    return(data.frame(temp_opt = t_opt$maximum,
                      germ_opt = t_opt$objective,
                      n_par = length(coef(m0)),
                      deviance = deviance(m0)))
}

# A experimental condition to test.
data <- germ[with(germ, iso == "11J23" &
                        exp == "1" &
                        tim == "6" &
                        tem > 10), ]

# Testing.
fit_poly(data = data)

# Apply to all experimental units except one.
temp_opt <- germ %>%
    filter(tem > 10 & !(tim == "12" & exp == "2" & iso == "3G23")) %>%
    group_by(exp, iso, tim) %>%
    do({
        fit_poly(.)
    }) %>%
    as.data.frame()

# ggplot(data = temp_opt,
#        mapping = aes(x = temp_opt, y = germ_opt, color = tim)) +
#     facet_wrap(facets = ~iso) +
#     geom_point()

gridExtra::grid.arrange(
               ggplot(data = temp_opt,
                      mapping = aes(x = iso, y = temp_opt,
                                    color = tim, group = tim)) +
               geom_point() +
               stat_summary(geom = "line", fun.y = "mean") +
               labs(y = "Optimal temperature for germination",
                    x = "Isolates"),
               ggplot(data = temp_opt,
                      mapping = aes(x = iso, y = germ_opt,
                                    color = tim, group = tim)) +
               geom_point() +
               stat_summary(geom = "line", fun = "mean") +
               labs(y = "Maximum germination",
                    x = "Isolates")
           )

Isolate comparison

# Model.
m0 <- lm(temp_opt ~ exp + iso * tim, data = temp_opt)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

emm <- emmeans(m0, specs = ~iso)
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

# Segment plot displaying means with confidence interval.
gg_temp <- ggplot(data = tb_means,
       mapping = aes(y = fct_reorder(iso, emmean), x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0,
              nudge_y = 0.10,
              show.legend = FALSE) +
    labs(x = expression("Optimal temperature for germination" ~ (""^degree * C)),
         y = "Isolates", title = "(A)")

# Analysis on the logit scale of the estimated germination.
temp_opt$logit_germ_opt <- binomial()$linkfun(temp_opt$germ_opt)
m0 <- lm(logit_germ_opt ~ exp + iso * tim, data = temp_opt[-8, ])

# m0 <- lm(germ_opt ~ exp + iso, data = temp_opt)
# MASS::boxcox(m0)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0)

m0 <- lm(logit_germ_opt ~ exp + iso + tim, data = temp_opt[-8, ])

emm <- emmeans(m0, specs = ~iso)
tb_means <- multcomp::cld(emm, Letters = letters) %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group) %>%
    mutate_at(c("emmean", "lower.CL", "upper.CL"), binomial()$linkinv)
tb_means

# ATTENTION: these values of germination is a mean of the predicted
# germination for the levels of tim. So, interpret them with caution. In
# fact, what is relevant is the contrast between isolates that represent
# the isolate effect.

# Segment plot displaying means with confidence interval.
gg_germ <- ggplot(data = tb_means,
       mapping = aes(y = fct_reorder(iso, emmean), x = emmean)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0,
              nudge_y = 0.10,
              show.legend = FALSE) +
    labs(x = expression("Proportion of maximum germination"),
         y = "Isolates", title = "(B)")

gridExtra::grid.arrange(gg_temp, gg_germ, ncol = 1)

# Get the means for all combinations despite the effects are additive.
emm <- emmeans(m0, specs = ~iso + tim) %>%
    # multcomp::cld(Letters = letters) %>%
    as.data.frame() %>%
    # mutate(.group = str_trim(.group)) %>%
    # rename(cld = .group) %>%
    mutate_at(c("emmean", "lower.CL", "upper.CL"), binomial()$linkinv)

pd <- position_dodge(0.1)
ggplot(data = emm,
       mapping = aes(x = fct_reorder(iso, emmean),
                     y = emmean,
                     color = tim,
                     group = tim)) +
    geom_point(position = pd) +
    geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL),
                  position = pd,
                  width = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f", emmean)),
              position = pd,
              hjust = -0.2,
              show.legend = FALSE) +
    labs(y = expression("Maximum germination"),
         x = "Isolates",
         color = "Incubation\nperiod (h)")

Appressory formation

# str(RDASC::pistachio_anthracnose$af)
af <- RDASC::pistachio_anthracnose$af %>%
    as_tibble()

af$spp <- factor(af$spp,
                 levels = c("Cf", "Ck"),
                 labels = c("C. fioriniae", "C. karstii"))

af <- af %>%
    mutate_at(c("exp", "spp", "iso"), as.factor)
af

ftable(xtabs(~exp + spp + iso, data = af))

ggplot(data = af,
       mapping = aes(x = iso,
                     y = app/tot,
                     color = spp,
                     shape = exp)) +
    geom_jitter(width = 0.1)

m0 <- glm(cbind(app, tot - app) ~ exp + iso,
          data = af,
          family = quasibinomial)

par(mfrow = c(2, 2))
plot(m0)
layout(1)

anova(m0, test = "F")

emm <- emmeans(m0, specs = ~iso)
tb_means <- multcomp::cld(emm, Letters = letters, type = "response") %>%
    as.data.frame() %>%
    mutate(.group = str_trim(.group)) %>%
    rename(cld = .group)
tb_means

# Segment plot displaying means with confidence interval.
gg_af <- ggplot(data = tb_means,
       mapping = aes(y = fct_reorder(iso, prob), x = prob)) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = asymp.LCL, xmax = asymp.UCL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", prob, cld)),
              vjust = 0,
              nudge_y = 0.05,
              show.legend = FALSE) +
    labs(x = expression("Appressory formation"),
         y = "Isolates")
gg_af

Patogenicity in vivo

# ATTENTION: new data from 2020 are available.
vv <- RDASC::pistachio_anthracnose$pato_vv
vv$cv[vv$cv == "RA"] <- "Red Aleppo"
vv$cv <- droplevels(vv$cv)
vv$spp <- factor(vv$spp,
                 levels = c("Cf", "Ck"),
                 labels = c("C. fioriniae", "C. karstii"))
str(vv)

# Incomplete crossed factorial design. But it is a complete factorial
# design in each year.
ftable(xtabs(~cv + spp + yr, data = vv))

# Each experimental codition has 3 trees in sequence and 10 clusters per
# tree.
ggplot(data = vv,
       mapping = aes(x = arb, y = hea/tot, color = spp)) +
    facet_grid(facets = yr ~ cv) +
    geom_point() +
    stat_summary(geom = "line", fun = "mean")

ggplot(data = vv,
       mapping = aes(x = spp,
                     y = hea/tot,
                     color = factor(arb),
                     group = 1)) +
    facet_grid(facets = yr ~ cv) +
    geom_jitter(width = 0.1) +
    stat_summary(geom = "line", fun = "mean")

#-----------------------------------------------------------------------
# Modelo de 3 componentes de variância.

# Variáveis que podem ser usadas na análise.
vv$prop <- vv$hea/vv$tot
# vv$asin <- asin(sqrt(vv$hea/vv$tot))
# vv$logit <- binomial()$linkfun(0.95 * (vv$hea/vv$tot - 0.5) + 0.5)

# Para guardar as médias.
tb_means <- list()

#--------------------------------------------
# 2017.

# Ajusta o modelo.
a <- 2017
mm0 <- lmer(prop ~ cv * spp + (1 | cv:arb) + (1 | cv:arb:spp),
            data = filter(vv, yr == a),
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~cv, data = filter(vv, yr == a)) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

#--------------------------------------------
# 2018.

# Ajusta o modelo.
a <- 2018
mm0 <- lmer(prop ~ cv * spp + (1 | cv:arb) + (1 | cv:arb:spp),
            data = filter(vv, yr == a),
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~cv, data = filter(vv, yr == a)) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

#--------------------------------------------
# 2019.

# Ajusta o modelo.
a <- 2019
mm0 <- lmer(prop ~ cv * spp + (1 | cv:arb) + (1 | cv:arb:spp),
            data = filter(vv, yr == a),
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~cv, data = filter(vv, yr == a)) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

#--------------------------------------------
# 2020.

# NOTE: This year only one spp were observed.

# Ajusta o modelo.
a <- 2020
mm0 <- lmer(prop ~ cv + (1 | cv:arb),
            data = filter(vv, yr == a),
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~cv, data = filter(vv, yr == a)) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

#-----------------------------------------------------------------------
# Gráficos.

tb_m <- tb_means %>%
    bind_rows(.id = "yr") %>%
    rename(cld = .group) %>%
    mutate(cld = str_trim(cld))

# Segment plot displaying means with confidence interval.
gg_yr <- ggplot(data = tb_m,
                mapping = aes(y = cv, x = emmean)) +
    facet_wrap(facets = ~yr, ncol = 1, scale = "free_y") +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                  height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0,
              nudge_y = 0.125,
              show.legend = FALSE) +
    labs(x = "Proportion of healthy nuts",
         y = "Cultivars")
gg_yr

Time of sensibility

# ATTENTION: new data from 2020 are available.
tos <- RDASC::pistachio_anthracnose$tos
tos$mo <- factor(tos$mo, levels = unique(tos$mo)[c(4, 5, 1, 2, 3)])
tos$spp <- factor(tos$spp,
                  levels = c("Cf", "Ck"),
                  labels = c("C. fioriniae", "C. karstii"))
str(tos)

# Incomplete crossed factorial design. But it is a complete factorial
# design in each year.
ftable(xtabs(~cv + spp + yr + mo, data = tos))

ggplot(data = tos,
       mapping = aes(x = mo, y = hea/tot, color = spp, group = spp)) +
    facet_wrap(facets = ~yr) +
    geom_jitter(width = 0.1) +
    stat_summary(geom = "line", fun = "mean")

# Criando a estrutura de unidades experimentais.
#
# UE: as árvores são os blocos.
# UE para mês de inoculação: 1/5 de cada árvore.
# UE para espécie: 1/2 de 1/5 de cada árvore.
# UA: 10 cachos por UE de espécie.

tos <- tos %>%
    mutate(ue_bl = interaction(yr, arb, drop = FALSE),
           ue_mo = interaction(yr, arb, mo, drop = FALSE),
           ue_spp = interaction(yr, arb, mo, spp, drop = FALSE))

tos %>%
    group_by(yr) %>%
    summarise_at(vars(ue_bl, ue_mo, ue_spp, tot),
                 ~ifelse(is.factor(.),
                         nlevels(droplevels(.)),
                         length(.)))

#-----------------------------------------------------------------------
# Variáveis.

# Variáveis que podem ser usadas na análise.
# tos$prop <- tos$hea/tos$tot
# tos$asin <- asin(sqrt(tos$hea/tos$tot))
tos$logit <- binomial()$linkfun(0.95 * (tos$hea/tos$tot - 0.5) + 0.5)
tos$y <- tos$logit

# Para guardar as médias.
tb_means <- list()

#--------------------------------------------
# 2017.

# Ajusta o modelo.
a <- 2017
tb <- droplevels(filter(tos, yr == a, mo %in% c("June", "July")))
mm0 <- lmer(y ~ spp * mo + (1 | ue_bl) + (1 | ue_mo) + (1 | ue_spp),
            data = tb,
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~spp, data = tb) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

u <- emmeans(mm0, specs = ~1, data = tb) %>%
    as.data.frame() %>%
    rename("yr" = "1") %>%
    mutate(yr = a)

tb_means[[as.character(a)]] <- bind_rows(tb_means[[as.character(a)]], u)
tb_means[[as.character(a)]]

#--------------------------------------------
# 2018.

# Ajusta o modelo.
a <- 2018
tb <- droplevels(filter(tos, yr == a))
mm0 <- lmer(y ~ spp * mo + (1 | ue_bl) + (1 | ue_mo) + (1 | ue_spp),
            data = tb,
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~spp, data = tb) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

u <- emmeans(mm0, specs = ~1, data = tb) %>%
    as.data.frame() %>%
    rename("yr" = "1") %>%
    mutate(yr = a)

tb_means[[as.character(a)]] <- bind_rows(tb_means[[as.character(a)]], u)
tb_means[[as.character(a)]]

#--------------------------------------------
# 2019.

# Ajusta o modelo.
a <- 2019
tb <- droplevels(filter(tos, yr == a))
mm0 <- lmer(y ~ spp * mo + (1 | ue_bl) + (1 | ue_mo) + (1 | ue_spp),
            data = tb,
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~mo | spp, data = tb) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

#--------------------------------------------
# 2020.

# Ajusta o modelo.
a <- 2020
tb <- droplevels(filter(tos, yr == a))
mm0 <- lmer(y ~ mo + (1 | ue_bl) + (1 | ue_mo) + (1 | ue_spp),
            data = tb,
            weights = tot)

# # Diagnóstico.
# qqnorm(residuals(mm0))
# plot(residuals(mm0) ~ fitted(mm0))

# Estimativa dos componentes de variância.
VarCorr(mm0)

# Quadro de testes de hipótese.
anova(mm0)

# Médias ajustadas.
tb_means[[as.character(a)]] <-
    emmeans(mm0, specs = ~mo, data = tb) %>%
    multcomp::cld(Letters = letters) %>%
    as.data.frame()
tb_means[[as.character(a)]]

u <- emmeans(mm0, specs = ~1, data = tb) %>%
    as.data.frame() %>%
    rename("yr" = "1") %>%
    mutate(yr = a)

tb_means[[as.character(a)]] <- bind_rows(tb_means[[as.character(a)]], u)
tb_means[[as.character(a)]]

#-----------------------------------------------------------------------
# Gráfico.

tb_means

tb_m <- tb_means %>%
    bind_rows(.id = "yr") %>%
    rename(cld = .group) %>%
    mutate(cld = str_trim(cld))

# Transformação inversa da aplicada para a análise.
tb_m <- tb_m %>%
    mutate_at(c("emmean", "lower.CL", "upper.CL"), binomial()$linkinv)

# Segment plot displaying means with confidence interval.
ggplot(data = filter(tb_m, yr %in% c("2017", "2018"), !is.na(spp)),
       mapping = aes(y = spp, x = emmean)) +
    facet_wrap(facets = ~yr, ncol = 1) +
    geom_point() +
    geom_errorbarh(mapping = aes(xmin = lower.CL, xmax = upper.CL),
                   height = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              vjust = 0,
              nudge_y = 0.05,
              show.legend = FALSE) +
    labs(x = "Proportion of healthy nuts",
         y = "Species")

ggplot(data = filter(tb_m, yr %in% c("2019")),
       mapping = aes(x = mo, y = emmean)) +
    facet_wrap(facets = ~spp, nrow = 1) +
    geom_point() +
    geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL),
                   width = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              hjust = 0,
              nudge_x = 0.05,
              show.legend = FALSE) +
    labs(y = "Proportion of healthy nuts",
         x = "Months")

ggplot(data = filter(tb_m, yr %in% c("2020"), !is.na(cld)),
       mapping = aes(x = mo, y = emmean)) +
    geom_point() +
    geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL),
                   width = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f%s", emmean, cld)),
              hjust = 0,
              nudge_x = 0.05,
              show.legend = FALSE) +
    labs(y = "Proportion of healthy nuts",
         x = "Months")

filter(tb_m, is.na(spp), is.na(mo))

ggplot(data = filter(tb_m, is.na(spp), is.na(mo)),
       mapping = aes(x = yr, y = emmean)) +
    geom_point() +
    geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL),
                   width = 0.1) +
    geom_text(mapping = aes(label = sprintf("%0.2f", emmean)),
              hjust = 0,
              nudge_x = 0.05,
              show.legend = FALSE) +
    labs(y = "Proportion of healthy nuts",
         x = "Seasons")

Weather data log

pos <- RDASC::pistachio_anthracnose$pos_weather
str(pos)

pos_l <- pos %>%
    gather(key = "serie", value = "value", -dt)

ggplot(data = pos_l,
       mapping = aes(x = dt, y = value)) +
    facet_grid(facets = serie ~ ., scale = "free_y",
               labeller = labeller(serie = c("dew" = "Dew point temp.",
                                             "lw" = "Leaf wetness",
                                             "rh" = "Rel. humidity",
                                             "tem" = "Temperature"))) +
    geom_line(color = "gray50") +
    geom_smooth(method = "loess", se = FALSE, span = 0.05,
                color = "black") +
    labs(x = "Days", y = "Value")

walmes/RDASC documentation built on Jan. 10, 2021, 8:02 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

walmes/RDASC
Reproducible Data Analysis of Cientific Cooperations

In walmes/RDASC: Reproducible Data Analysis of Cientific Cooperations

Session definitions

Area under the micelial growth curve

Exploratory data analysis

Optimal temperature for each isolate

Optimal temperature estimation

Isolate comparison

Sporulation capability

Exploratory data analysis

Model fitting

Morphological conidia characterization

Exploratory data analysis

Conidia length

Conidia width

Conidia volume

Germination

Exploratory data analysis

Fit curves for each experimental point

Isolate comparison

Appressory formation

Patogenicity in vivo

Time of sensibility

Weather data log

R Package Documentation

Browse R Packages

We want your feedback!

walmes/RDASC Reproducible Data Analysis of Cientific Cooperations

In walmes/RDASC: Reproducible Data Analysis of Cientific Cooperations

Session definitions

Area under the micelial growth curve

Exploratory data analysis

Optimal temperature for each isolate

Optimal temperature estimation

Isolate comparison

Sporulation capability

Exploratory data analysis

Model fitting

Morphological conidia characterization

Exploratory data analysis

Conidia length

Conidia width

Conidia volume

Germination

Exploratory data analysis

Fit curves for each experimental point

Isolate comparison

Appressory formation

Patogenicity in vivo

Time of sensibility

Weather data log

R Package Documentation

Browse R Packages

We want your feedback!

walmes/RDASC
Reproducible Data Analysis of Cientific Cooperations