README.md

IDENTcc

Catch and correct potential errors in data sets of IDENT sites. Currently implements general functions for single-variable error detection and tracing, as well as a higher level function to correct mortality assessments.

The package uses graph-based representations of data sequences as well as sparse matrices to improve performance. R built against an optimized BLAS implementation – such as OpenBLAS – is recommended to achieve optimal performance.

Installation

# Install the package from GitHub via `devtools()`:
# install.packages("devtools")
devtools::install_github("dschoenig/IDENTcc")

Example: Mortality

A common error consists in a tree being classified as Dead in one year and as Alive in a later year. In case the later classification is trusted and assumed to be true, preceding values of Dead should be replaced by Alive.

The correction generally involves three steps using lower level functions:

# Sequence of measurements containing several errors
x_test <- c("Alive", "Almost dead", "Dead", "Dead", "Dead", "Alive", "Dead", 
            "Dead", "Alive", "Dead", "Cut and Resprout", "Dead", "Alive")

# 1. Detect erroneous transitions from one value to the next
dead_alive <- detect_transitions(x = x_test, transition = c("Dead", "Alive"))

# 2. Trace erroneous values back from the transitions
chains <- backtrace_values(x = x_test, backtrace_val = "Dead", starts = dead_alive)

# 3. Replace chains of erroneous values
replace_backtraced(x = x_test, backtrace = chains, replace_val = "Alive")

For a standard inventory table, the same task may be performed using lazarus():

# Backwards replace a chain of subsequent "Dead" values that occur before a
# transition from "Dead" to "Alive"

data(mortality)

lazarus(mortality,
        variable = "StateDesc",
        sort_var = "YearInv",
        transition = c("Dead", "Alive"),
        backtrace_val = "Dead", replace_val = "Alive",
        append = TRUE)

In this case, the (fictitious) data set is returned with two additional columns: StateDesc_replace indicating whether an entry was replaced (1) or not (0), and StateDesc_new holding the new sequence of values after performing replacements. The data set is automatically partitioned according to Block, Plot, and Pos (i.e. position of the individual). For each position, the sequence of values is sorted by the variable (i.e. column) specified with sort_var. In the example, values are sorted after the year of the inventory, in ascending order.

Ideally, the column specified with sort_var is of type numeric or integer. If it is a factor, sorting will be based on factor levels. This may introduce errors if the ordering of factor levels is not as expected:

data(mortality)

# Change the factor levels of column `Inv` to an "alphabetic" sort
levels(mortality$Inv) <- as.character(c(1, 10, 2:9))
levels(mortality$Inv)

# Rerun error "correction" with problematic sorting according to column `Inv`
lazarus(mortality,
        variable = "StateDesc",
        sort_var = "Inv",
        transition = c("Dead", "Alive"),
        backtrace_val = "Dead", replace_val = "Alive",
        append = TRUE)

# This results in 44 entries being replaced (instead of 24) as the inventory
# with ID `10` (corresponding to 2018) was treated as occuring directly after
# the inventory with ID `1` (corresponding to 2009), resulting in an incorrect
# sequence of values

For more information see the help files of the respective functions.



dschoenig/IDENTcc documentation built on May 16, 2019, 4:07 a.m.