In pkuhnert/diet: Performs an analysis of diet data using univariate trees

The diet package allows you to perform an analysis of diet data. The analyses comprise a mixture of exploratory plots of the data in addition to a classification tree analysis that allows you to predict the diet composition and understand what environmental and spatial variables might be influencing the predictions. The methods rely on the \code{rpart} package, well known for classification tree analysis.

Data

Two R datasets are included in the package. The first is a diet matrix object \texttt{yftDMraw}. The second is a predator-prey R object. Both are automatically loaded into the R workspace when the package is installed and attached. The \texttt{yftDMraw} R object is a (571 $\times$ 40) data frame, where each row represents a fish sample. Columns 12 through to 40 contain the prey wet weight for any prey species found in the stomach contents of the fish. The first 11 columns contain information about the sampling ID, date and location, sea surface temperature, size of the fish and the species of the fish. The \texttt{yftPPraw} R object is a (717 $\times$ 13) data frame that expands the fish diet information so each row represents a prey proportion (last column) for a specific prey species.

Below is R code for reading in a diet matrix, assigning prey colours (which have been defined in the PreyTaxonSort R data object) and updating the diet matrix with "Group" assigned prey taxa codes.

library(diet)

write.csv(yftDMraw, file = "yftDMraw.csv", row.names = FALSE)
yftpp1 <- read.dm(filenm = "yftDMraw.csv",
                  labels = list(FullnessL = "Fullness", DateL = "Date"),
                  Datef = "%m/%d/%Y", diet.ind.start = 12, p = 0.01)
val <- apc(x = yftpp1, preyfile = PreyTaxonSort, check = TRUE)
node.colsY <- val$cols
dietPP <- val$x   
head(dietPP)

Here is an example of how to read in a predator-prey matrix. As in the previous example, we assign colours to the prey species and update the diet matrix with the prey taxa codes. This format is required for the analysis and we also include a 'predator' column.

write.csv(yftPPraw, file = "yftPPraw.csv", row.names = FALSE)
yftpp2 <- read.pp(filenm = "yftPPraw.csv",
                  labels = list(PredatorL = "TripSetPredNo", TripSetL = "TripSetNo",
                                SpeciesL = "Family", FullnessL = "Fullness",
                                DateL = "Date", WeightL = "PropW", PreyGrpL = "Family"),
                  Datef = "%m/%d/%Y", p = 0.01,
                  Xvars = c("Lat", "Lon", "Year", "Quarter", "Length", "SST"))
pal <- c(topo.colors(10)[1:2], heat.colors(10)[1:2], terrain.colors(25)[1:8])
val <- apc(x = yftpp2, preyfile = PreyTaxonSort, palette = pal, check = TRUE)
node.colsY <- val$cols
dietPP <- val$x   
head(dietPP)

# Create a predator column
dietPP$Predator <- as.factor(rep("YFT", nrow(dietPP)))
head(dietPP)

Exploring diet data

After reading in the data, we can then produce a number of exploratory plots and summaries.

explore.diet <- plot(x = dietPP, LonID = "Lon", LatID = "Lat", 
                     Xvar = c("Quarter", "Year", "SST", "Length", "Lat", "Lon"),  
                     PredSpID = "Predator", mapxlim = c(-125, -75), mapylim = c(0, 30),
                     SmXvar = c("SST", "Length"), PredIDno = "TripSetPredNo", col = "gold3",
                     Factor = "Predator", prey.cols = node.colsY)

We can extract some summary statistics from the data that summarise the predictor variables:

explore.diet$dataS1

We can also extract information on the number of observations (nobs), number of predators (npred) and number of prey (nprey)

explore.diet$dataS2

A map showing where the samples were collected can be produced through the following:

explore.diet$expl1

The composition of species and proportion through time can be produced by calling the following plot:

print(explore.diet$plotSpComp)

Maps showing the predictor variable's spatial distribution can be produced by calling the following plot:

print(explore.diet$smplot)

Building a model using the diet package

We now fit a classification tree to the diet data using the methodology outlined in Kuhnert et al. (XXXX). This is achieved using the \texttt{dpart} function. We specify a complexity parameter of 0.001 to grow a large tree initially. We also specify \texttt{minsplit=10}, which does not split on a node with 10 or fewer observations. The tree is displayed using the plot function, where we pass the node colours. We also have an option to make the splits uniform. By specifying \texttt{uniform=FALSE} we produce a tree that highlights splits that are important in the model. We choose \texttt{uniform=TRUE} for the initial model as it is quite large.

yft.dp <- dpart(Group ~ Lat + Lon + Year + Quarter + SST  + Length,
                data = dietPP, weights = W, minsplit = 10,
                cp = 0.001)
plot(yft.dp, node.cols=node.colsY, uniform = TRUE)
# summary(yft.dp)  # not run as the output is large
print(yft.dp, setID = "TripSetNo")

We now prune the model using Breiman et al. (1984) 1SE rule.

yft.pr <- prune(yft.dp, se = 1)
plot(yft.pr, node.cols = node.colsY)
plot(yft.pr, node.cols = node.colsY, uniform = FALSE)

We can extract a variable importance ranking to determine which variables are the most important.

vi <- importance(yft.pr)

Maps of diversity can be produced using the \texttt{diversity} function. This index is formed from the gini index of diversity and provides an indication of how spatially diverse the ocean is in terms of prey consumed by the predator.

D <- diversity(object = yft.pr, LatID = "Lat", LonID = "Lon")

Once we are happy with the decision tree, we can explore nodes of the tree and produce a map showing the samples that fell into that node, along with a summary of the composition of prey represented by that node. The following code uses the \texttt{grab} function to allow a user to select a node for exploration. As this code is interactive, we do not produce any ouput here.

val <- grab(object = yft.pr, LatID = "Lat", LonID = "Lon", setID = "TripSetNo",
            node.cols = node.colsY, cex = 1, 
            mapcol = "gold3")
val <- grabmulti(object = yft.pr, LatID = "Lat", LonID = "Lon", setID = "TripSetNo",
                 node.cols = node.colsY, cex = 1, 
                 mapcol = "gold3")