loess_aicc_optimized: Fitting of AICC optimized loess (groupwise)

View source: R/loess_aicc_optimized.R

loess_aicc_optimizedR Documentation

Fitting of AICC optimized loess (groupwise)

Description

Given a data.table, three columns may be specified as dependent variable, independent variable, and group respectively to fit an AICC-optimized-span loess model. The predicted values and upper/lower values of confidence band are added as new columns per group. Alternatively the function may simply return the full optimized model objects per group.

Usage

loess_aicc_optimized(x, independent_var = NULL,
  independent_var_range = NULL, dependent_var = NULL, groups = NULL,
  group_subset = NULL, span_interval = NULL, degree = 2,
  family = "gaussian", normalize = TRUE, confidence_interval = 0.95,
  output = "model")

Arguments

x

A data.table with data in long format (use data.table::melt if your data is in wide format).

independent_var

Name of column containing the independent variable.

independent_var_range

By default loess is fitted for the full range of the independent variable. A limited range may be specified here, usually an integer vector.

dependent_var

The data column of the values to be fitted by loess.

groups

Name of the column containing the names of the groups.

group_subset

By default loess is fitted for all groups. A subset may be specified as character vector.

span_interval

The span interval to be considered for AICC optimization of the loess span. By default c(.1,.9).

degree

Degree parameter passed to loess. By default 2.

family

Family parameter passed to loess. By default "gaussian".

normalize

Normalize parameter passed to loess. By default "TRUE".

confidence_interval

The confidence interval for calculating upper/lower values for plotting confidence band. By default 0.95. See detail section for details on calculation.

output

"fitted_values" results in new columns added to x containing fitted values for plotting, etc. "model" results in return of the full optimized model objects per group.

Details

For the default confidence interval of 0.95, upper / lower values of confidence band are calculated as model$fit +/- qt(0.25,model$df)*model$se. This is only an estimation of the true confidence interval based on the standard error, see: https://stackoverflow.com/questions/22717930/how-to-get-the-confidence-intervals-for-lowess-fit-using-r

Value

The original x with three additional columns named "aicc_loess_fit" and "aicc_loess_lwr", "aicc_loess_upr". The two latter columns inlcude the upper and lower values of the specified confidence interval. The entries in these additional columns that correspond to excluded groups or independent variables appear as NA_real_.

If output = "model" a two column data.table is returned containing the group names and associated fitted model object per row.

Examples



# the following example shows how aicc loess
# produces a fit that is more sensible to the data
# in comparison to standard loess using a span of 0.75
# if such less smooth appearance makes sense is, of course, case specific


library(data.table)
years <- 1990:2010
set.seed(42)
topics <- c(rep(1, length(years)), rep(2, length(years)))
values1 <- jitter(rnorm(length(years), mean = 0.5, sd = .25), amount = pi)
values2 <- jitter(rnorm(length(years), mean = 0.25, sd = 0), amount = exp(1))
data <- data.table(year = years, topic = topics, value = c(values1, values2))

# first option: output = "fitted_values"
data <- loess_aicc_optimized(x = data
                             , independent_var = "year"
                             , independent_var_range = 1991:2009
                             , dependent_var = "value"
                             , groups = "topic"
                             , group_subset = NULL
                             , span_interval = NULL
                             , degree = 2
                             , family = "gaussian"
                             , normalize = TRUE
                             , output = "fitted_values")
# check the new content
print(data)

plot(data[topic == 1,year], data[topic == 1,value], col = 1, ylim = c(min(data$value), max(data$value)))
#default span is set to 0.75 in loess; this is explicitly included in below call for the sake of clarity
lines(data[topic == 1,year], predict(loess(value ~ year, span = 0.75, data = data[topic == 1,])), col = 1)
lines(data[topic == 1,year], data[topic == 1,aicc_loess_fit], col = 2)
# add aicc loess confidence intervals
lines(data[topic == 1,year], data[topic == 1,aicc_loess_lwr], col = 2, lty = "dashed")
lines(data[topic == 1,year], data[topic == 1,aicc_loess_upr], col = 2, lty = "dashed")
legend(x = 1990, y = max(data[topic == 1,value]), c("loess_aicc (dashed: conf. intv.)", "loess_standard"), fill = c(2,1))

plot(data[topic == 2,year], data[topic == 2,value], col = 1)
lines(data[topic == 2,year], predict(loess(value ~ year, data = data[topic == 2,])), col = 1)
lines(data[topic == 2,year], data[topic == 2,aicc_loess_fit], col = 2)
legend(x = 1990, y = max(data[topic == 2,value]), c("loess_aicc", "loess_standard"), fill = c(2,1))

# second option: output = "model"
models = loess_aicc_optimized(x = data
                              , independent_var = "year"
                              , dependent_var = "value"
                              , groups = "topic"
                              , output = "model")
models
# group model_aicc_loess
# 1:     1          <loess>
# 2:     2          <loess>
class(models[1,2])
# [1] "data.table" "data.frame"
class(models[1,2][[1]])
# [1] "list"
class(models[1,2][[1]][[1]])
# [1] "loess"

manuelbickel/textility documentation built on Nov. 25, 2022, 9:07 p.m.