Nothing
################################################################################
# Function: cont_cdftest (exported)
# Programmer Tom Kincaid
# Date: October 23, 2020
# Revised: December 15, 2020 to allow use of the Horvitz-Thompson and
# Yates-Grundy variance estimators and to use a new function named
# survey_design to create the survey design object
# Revised: January 28, 2021 to replace "warn.vec" with "warn_vec"
# Revised: February 23, 2021 to correct errors in the code for creating the
# warnings data frame (warn_df)
# Revised: April 7, 2021 to ensure that the dframe argument does not contain
# zero rows
# Revised: April 29, 2021 to ensure that the dframe argument only belongs to
# class "data.frame"
# Revised: May 4, 2021 to avoid warning messages being generated during creation
# of help files
# Revised: May 6, 2021 to ensure that sf objects do not belong to class tbl_df
# Revised: June 8, 2021 to simplify specification of the values required for
# calculation of the finite population correction factor and to
# eliminate use of the finite population correction factor with the
# local mean variance estimator
# Revised: September 9, 2021 to revise the documentation for argument popsize
#
#' Cumulative distribution function (CDF) inference for a probability survey
#'
#' This function organizes input and output for conducting inference regarding
#' cumulative distribution functions (CDFs) generated by a probability survey.
#' For every response variable and every subpopulation (domain) variable,
#' differences between CDFs are tested for every pair of subpopulations within
#' the domain. Data input to the function can be either a single survey or
#' multiple surveys (two or more). If the data contain multiple surveys, then
#' the domain variables will reference those surveys and (potentially)
#' subpopulations within those surveys. The inferential procedures divide the
#' CDFs into a discrete set of intervals (classes) and then utilize procedures
#' that have been developed for analysis of categorical data from probability
#' surveys. Choices for inference are the Wald, adjusted Wald, Rao-Scott first
#' order corrected (mean eigenvalue corrected), and Rao-Scott second order
#' corrected (Satterthwaite corrected) test statistics. The default test
#' statistic is the adjusted Wald statistic. The input data argument can be
#' either a data frame or a simple features (sf) object. If an sf object is
#' used, coordinates are extracted from the geometry column in the object,
#' arguments xcoord and ycoord are assigned values "xcoord" and "ycoord",
#' respectively, and the geometry column is dropped from the object.
#'
#' @param dframe Data frame containing survey design variables, response
#' variables, and subpopulation (domain) variables.
#'
#' @param vars Vector composed of character values that identify the
#' names of response variables in the \code{dframe} data frame.
#'
#' @param subpops Vector composed of character values that identify the
#' names of subpopulation (domain) variables in the \code{dframe} data frame.
#' If a value is not provided, the value \code{"All_Sites"} is assigned to the
#' subpops argument and a factor variable named \code{"All_Sites"} that takes
#' the value \code{"All Sites"} is added to the \code{dframe} data frame. The
#' default value is \code{NULL}.
#'
#' @param surveyID Character value providing name of the survey ID variable in
#' the \code{dframe} data frame. If this argument equals \code{NULL}, then
#' the dframe data frame contains data for a single survey. The default value
#' is \code{NULL}.
#'
#' @param siteID Character value providing name of the site ID variable in
#' the \code{dframe} data frame. For a two-stage sample, the site ID variable
#' identifies stage two site IDs. The default value is \code{"siteID"}.
#'
#' @param weight Character value providing name of the survey design weight
#' variable in the \code{dframe} data frame. For a two-stage sample, the
#' weight variable identifies stage two weights. The default value is
#' \code{"weight"}.
#'
#' @param xcoord Character value providing name of the x-coordinate variable in
#' the \code{dframe} data frame. For a two-stage sample, the x-coordinate
#' variable identifies stage two x-coordinates. Note that x-coordinates are
#' required for calculation of the local mean variance estimator. The default
#' value is \code{NULL}.
#'
#' @param ycoord Character value providing name of the y-coordinate variable in
#' the \code{dframe} data frame. For a two-stage sample, the y-coordinate
#' variable identifies stage two y-coordinates. Note that y-coordinates are
#' required for calculation of the local mean variance estimator. The default
#' value is \code{NULL}.
#'
#' @param stratumID Character value providing name of the stratum ID variable in
#' the \code{dframe} data frame. The default value is \code{NULL}.
#'
#' @param clusterID Character value providing the name of the cluster
#' (stage one) ID variable in the \code{dframe} data frame. Note that cluster
#' IDs are required for a two-stage sample. The default value is \code{NULL}.
#'
#' @param weight1 Character value providing name of the stage one weight
#' variable in the \code{dframe} data frame. The default value is \code{NULL}.
#'
#' @param xcoord1 Character value providing the name of the stage one
#' x-coordinate variable in the \code{dframe} data frame. Note that x
#' coordinates are required for calculation of the local mean variance
#' estimator. The default value is \code{NULL}.
#'
#' @param ycoord1 Character value providing the name of the stage one
#' y-coordinate variable in the \code{dframe} data frame. Note that
#' y-coordinates are required for calculation of the local mean variance
#' estimator. The default value is \code{NULL}.
#'
#' @param sizeweight Logical value that indicates whether size weights should be
#' used during estimation, where \code{TRUE} uses size weights and
#' \code{FALSE} does not use size weights. To employ size weights for a
#' single-stage sample, a value must be supplied for argument weight. To
#' employ size weights for a two-stage sample, values must be supplied for
#' arguments \code{weight} and \code{weight1}. The default value is \code{FALSE}.
#'
#' @param sweight Character value providing the name of the size weight variable
#' in the \code{dframe} data frame. For a two-stage sample, the size weight
#' variable identifies stage two size weights. The default value is
#' \code{NULL}.
#'
#' @param sweight1 Character value providing name of the stage one size weight
#' variable in the \code{dframe} data frame. The default value is \code{NULL}.
#'
#' @param sweight1 Character value providing name of the stage one size weight
#' variable in the \code{dframe} data frame. The default value is \code{NULL}.
#'
#' @param fpc Object that specifies values required for calculation of the
#' finite population correction factor used during variance estimation. The
#' object must match the survey design in terms of stratification and whether
#' the design is single-stage or two-stage. For an unstratified design, the
#' object is a vector. The vector is composed of a single numeric value for a
#' single-stage design. For a two-stage unstratified design, the object is a
#' named vector containing one more than the number of clusters in the sample,
#' where the first item in the vector specifies the number of clusters in the
#' population and each subsequent item specifies the number of stage two units
#' for the cluster. The name for the first item in the vector is arbitrary.
#' Subsequent names in the vector identify clusters and must match the cluster
#' IDs. For a stratified design, the object is a named list of vectors, where
#' names must match the strata IDs. For each stratum, the format of the
#' vector is identical to the format described for unstratified single-stage
#' and two-stage designs. Note that the finite population correction factor
#' is not used with the local mean variance estimator.
#'
#' Example fpc for a single-stage unstratified survey design:
#'
#' \verb{fpc <- 15000}
#'
#' Example fpc for a single-stage stratified survey design:
#'
#' \verb{fpc <- list(
#' Stratum_1 = 9000,
#' Stratum_2 = 6000)
#' }
#'
#' Example fpc for a two-stage unstratified survey design:
#'
#' \verb{fpc <- c(
#' Ncluster = 150,
#' Cluster_1 = 150,
#' Cluster_2 = 75,
#' Cluster_3 = 75,
#' Cluster_4 = 125,
#' Cluster_5 = 75)
#' }
#'
#' Example fpc for a two-stage stratified survey design:
#'
#' \verb{fpc <- list(
#' Stratum_1 = c(
#' Ncluster_1 = 100,
#' Cluster_1 = 125,
#' Cluster_2 = 100,
#' Cluster_3 = 100,
#' Cluster_4 = 125,
#' Cluster_5 = 50),
#' Stratum_2 = c(
#' Ncluster_2 = 50,
#' Cluster_1 = 75,
#' Cluster_2 = 150,
#' Cluster_3 = 75,
#' Cluster_4 = 75,
#' Cluster_5 = 125))
#' }
#'
#' @param popsize Object that provides values for the population argument of the
#' \code{calibrate} or \code{postStratify} functions in the survey package. If
#' a value is provided for popsize, then either the \code{calibrate} or
#' \code{postStratify} function is used to modify the survey design object
#' that is required by functions in the survey package. Whether to use the
#' \code{calibrate} or \code{postStratify} function is dictated by the format
#' of popsize, which is discussed below. Post-stratification adjusts the
#' sampling and replicate weights so that the joint distribution of a set of
#' post-stratifying variables matches the known population joint distribution.
#' Calibration, generalized raking, or GREG estimators generalize
#' post-stratification and raking by calibrating a sample to the marginal
#' totals of variables in a linear regression model. For the \code{calibrate}
#' function, the object is a named list, where the names identify factor
#' variables in the \code{dframe} data frame. Each element of the list is a
#' named vector containing the population total for each level of the
#' associated factor variable. For the \code{postStratify} function, the
#' object is either a data frame, table, or xtabs object that provides the
#' population total for all combinations of selected factor variables in the
#' \code{dframe} data frame. If a data frame is used for \code{popsize}, the
#' variable containing population totals must be the last variable in the data
#' frame. If a table is used for \code{popsize}, the table must have named
#' \code{dimnames} where the names identify factor variables in the
#' \code{dframe} data frame. If the popsize argument is equal to \code{NULL},
#' then neither calibration nor post-stratification is performed. The default
#' value is \code{NULL}.
#'
#' Example popsize for calibration:
#'
#' \verb{popsize <- list(
#' Ecoregion = c(
#' East = 750,
#' Central = 500,
#' West = 250),
#' Type = c(
#' Streams = 1150,
#' Rivers = 350))
#' }
#'
#' Example popsize for post-stratification using a data frame:
#'
#' \verb{popsize <- data.frame(
#' Ecoregion = rep(c("East", "Central", "West"),
#' rep(2, 3)),
#' Type = rep(c("Streams", "Rivers"), 3),
#' Total = c(575, 175, 400, 100, 175, 75))
#' }
#'
#' Example popsize for post-stratification using a table:
#'
#' \verb{popsize <- with(MySurveyFrame,
#' table(Ecoregion, Type))}
#'
#' Example popsize for post-stratification using an xtabs object:
#'
#' \verb{popsize <- xtabs(~Ecoregion + Type,
#' data = MySurveyFrame)}
#'
#' @param vartype Character value providing the choice of the variance
#' estimator, where \code{"Local"} indicates the local mean estimator,
#' \code{"SRS"} indicates the simple random sampling estimator, \code{"HT"}
#' indicates the Horvitz-Thompson estimator, and \code{"YG"} indicates the
#' Yates-Grundy estimator. The default value is \code{"Local"}.
#'
#' @param jointprob Character value providing the choice of joint inclusion
#' probability approximation for use with Horvitz-Thompson and Yates-Grundy
#' variance estimators, where \code{"overton"} indicates the Overton
#' approximation, \code{"hr"} indicates the Hartley-Rao approximation, and
#' \code{"brewer"} equals the Brewer approximation. The default value is
#' \code{"overton"}.
#'
#' @param testname Name of the test statistic to be reported in the output
#' data frame. Choices for the name are: \code{"Wald"}, \code{"adjWald"},
#' \code{"RaoScott_First"}, and \code{"RaoScott_Second"}, which correspond to
#' the Wald statistic, adjusted Wald statistic, Rao-Scott first-order
#' corrected statistic, and Rao-Scott second-order corrected statistic,
#' respectively. The default is \code{"adjWald"}.
#'
#' @param nclass Number of classes into which the CDFs will be divided
#' (binned), which must equal at least \code{2}. The default is \code{3}.
#'
#' @return Data frame of CDF test results for all pairs of subpopulations
#' within each population type for every response variable. The data frame
#' includes the test statistic specified by argument \code{testname} plus its
#' degrees of freedom and p-value.
#'
#' @author Tom Kincaid \email{Kincaid.Tom@@epa.gov}
#'
#' @keywords survey
#'
#' @seealso
#' \describe{
#' \item{\code{\link{cdf_plot}}}{ for visualizing CDF plots}
#' \item{\code{\link{cont_cdfplot}}}{ for making CDF plots output to pdfs}
#' }
#'
#' @examples
#' n <- 200
#' mysiteID <- paste("Site", 1:n, sep = "")
#' dframe <- data.frame(
#' siteID = mysiteID,
#' wgt = runif(n, 10, 100),
#' xcoord = runif(n),
#' ycoord = runif(n),
#' stratum = rep(c("Stratum1", "Stratum2"), n / 2),
#' Resource_Class = sample(c("Agr", "Forest", "Urban"), n, replace = TRUE)
#' )
#' ContVar <- numeric(n)
#' tst <- dframe$Resource_Class == "Agr"
#' ContVar[tst] <- rnorm(sum(tst), 10, 1)
#' tst <- dframe$Resource_Class == "Forest"
#' ContVar[tst] <- rnorm(sum(tst), 10.1, 1)
#' tst <- dframe$Resource_Class == "Urban"
#' ContVar[tst] <- rnorm(sum(tst), 10.5, 1)
#' dframe$ContVar <- ContVar
#' myvars <- c("ContVar")
#' mysubpops <- c("Resource_Class")
#' mypopsize <- data.frame(
#' Resource_Class = rep(c("Agr", "Forest", "Urban"), rep(2, 3)),
#' stratum = rep(c("Stratum1", "Stratum2"), 3),
#' Total = c(2500, 1500, 1000, 500, 600, 450)
#' )
#' cont_cdftest(dframe,
#' vars = myvars, subpops = mysubpops, siteID = "siteID",
#' weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
#' stratumID = "stratum", popsize = mypopsize, testname = "RaoScott_First"
#' )
#' @export
################################################################################
cont_cdftest <- function(dframe, vars, subpops = NULL, surveyID = NULL, siteID = "siteID",
weight = "weight", xcoord = NULL, ycoord = NULL, stratumID = NULL,
clusterID = NULL, weight1 = NULL, xcoord1 = NULL, ycoord1 = NULL,
sizeweight = FALSE, sweight = NULL, sweight1 = NULL, fpc = NULL,
popsize = NULL, vartype = "Local", jointprob = "overton",
testname = "adjWald", nclass = 3) {
# Create a vector for error messages
error_ind <- FALSE
error_vec <- NULL
# Create a data frame for warning messages
warn_ind <- FALSE
warn_df <- NULL
fname <- "cont_cdftest"
# Check for a valid test statistic name
temp <- match(testname, c(
"Wald", "adjWald", "RaoScott_First",
"RaoScott_Second"
), nomatch = 0)
if (temp == 0) {
error_ind <- TRUE
msg <- paste0("\nThe value provided for argument testname, \"", testname, "\", is not a valid test statistic name.\n")
error_vec <- c(error_vec, msg)
}
# As necessary, reassign the test statistic name to match the values required
# by the svychisq_localmean function
if (testname == "RaoScott_First") {
testname <- "Chisq"
} else if (testname == "RaoScott_Second") {
testname <- "F"
}
# Check for the minimum number of classes
if (testname %in% c("Wald", "adjWald")) {
if (nclass < 2) {
error_ind <- TRUE
msg <- paste("\nThe number of classes into which the CDFs will be divided (binned) must equal \nat least two.")
error_vec <- c(error_vec, msg)
}
} else {
if (nclass < 3) {
error_ind <- TRUE
msg <- paste("\nThe number of classes into which the CDFs will be divided (binned) must equal \nat least three")
error_vec <- c(error_vec, msg)
}
}
# Ensure that the dframe argument was provided
if (missing(dframe) | is.null(dframe)) {
stop("\nThe dframe argument must be provided.\n")
}
# If the dframe argument is an sf object, extract coordinates from the
# geometry column, assign values "xcoord" and "ycoord" to arguments xcoord and
# ycoord, respectively, and drop the geometry column from the object
if ("sf" %in% class(dframe)) {
temp <- st_coordinates(dframe)
xcoord <- "xcoord"
dframe$xcoord <- temp[, "X"]
ycoord <- "ycoord"
dframe$ycoord <- temp[, "Y"]
dframe <- st_set_geometry(dframe, NULL)
}
# If the dframe argument is a tibble or does not belong to class
# "data.frame", coerce the argument to class "data.frame"
if ("tbl_df" %in% class(dframe) | !("data.frame" %in% class(dframe))) {
dframe <- as.data.frame(dframe)
}
# Ensure that the dframe argument does not contain zero rows
if (nrow(dframe) == 0) {
stop("\nThe dframe argument contains zero rows.\n")
}
# Ensure that unused levels are dropped from factor variables in the dframe
# data frame
dframe <- droplevels(dframe)
# As necessary, ensure that the dframe data frame contains the survey ID
# variable
if (!is.null(surveyID)) {
if (!(surveyID %in% names(dframe))) {
ind1 <- FALSE
error_ind <- TRUE
msg <- paste0("The name provided for the surveyID argument, \"", surveyID, "\", does not occur among \nthe names for the dframe data frame.\n")
error_vec <- c(error_vec, msg)
} else {
survey_names <- as.vector(unique(dframe[, surveyID]))
ind1 <- TRUE
}
}
# Ensure that the dframe data frame contains the site ID variable
if (!(siteID %in% names(dframe))) {
ind2 <- FALSE
error_ind <- TRUE
msg <- paste0("The name provided for the siteID argument, \"", siteID, "\", does not occur among \nthe names for the dframe data frame.\n")
error_vec <- c(error_vec, msg)
} else {
ind2 <- TRUE
}
# Check site IDs for repeat values and, as necessary, create unique site IDs
# and output a warning message
if (is.null(surveyID)) {
if (ind2) {
IDs <- dframe[, siteID]
temp <- sapply(split(IDs, IDs), length)
if (any(temp > 1)) {
warn_ind <- TRUE
temp.str <- vecprint(names(temp)[temp > 1])
warn <- paste("The following site ID values occur more than once among the values that were \ninput to the function:\n", temp.str)
act <- "Unique site ID values were created.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = NA,
subpop = NA, indicator = NA, stratum = NA, warning = I(warn), action = I(act)
))
dframe[, siteID] <- uniqueID(dframe[, siteID])
}
}
} else {
if (ind1 & ind2) {
IDs <- dframe[, siteID]
for (i in survey_names) {
eval(parse(text = paste0("tst <- dframe[, surveyID] == \"", i, "\"")))
temp <- sapply(split(IDs[tst], IDs[tst]), length)
if (any(temp > 1)) {
warn_ind <- TRUE
temp.str <- vecprint(names(temp)[temp > 1])
warn <- paste("The following site ID values occur more than once among survey", i, "values \nthat were input to the function:\n", temp.str)
act <- "Unique site ID values were created.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = NA,
subpop = NA, indicator = NA, stratum = NA, warning = I(warn), action = I(act)
))
dframe[, siteID][tst] <- uniqueID(dframe[, siteID][tst])
}
}
}
}
# Ensure that the dframe data frame contains the survey weight variable
if (!(weight %in% names(dframe))) {
error_ind <- TRUE
msg <- paste0("The name provided for the weight argument, \"", weight, "\", does not occur among \nthe names for the dframe data frame.\n")
error_vec <- c(error_vec, msg)
}
# As necessary, create a stratum variable that incorporates survey ID values
if (!is.null(surveyID)) {
if (!is.null(stratumID)) {
dframe$stratumID <- paste(dframe[, surveyID], dframe[, stratumID],
sep = "."
)
stratumID <- "stratumID"
} else {
stratumID <- surveyID
}
}
# Assign names to the variables required for calculation of the finite
# population correction factor
if (is.null(fpc)) {
fpcfactor_ind <- FALSE
fpcsize <- NULL
Ncluster <- NULL
stage1size <- NULL
} else {
fpcfactor_ind <- TRUE
if (is.null(clusterID)) {
fpcsize <- "fpcsize"
Ncluster <- NULL
stage1size <- NULL
} else {
fpcsize <- NULL
Ncluster <- "Ncluster"
stage1size <- "stage1size"
}
}
# Create a list containing names of survey design variables
design_names <- list(
siteID = siteID,
weight = weight,
xcoord = xcoord,
ycoord = ycoord,
stratumID = stratumID,
clusterID = clusterID,
weight1 = weight1,
xcoord1 = xcoord1,
ycoord1 = ycoord1,
sweight = sweight,
sweight1 = sweight1,
fpcsize = fpcsize,
Ncluster = Ncluster,
stage1size = stage1size
)
# Ensure that a value was provided for the vars (response variable names)
# argument
if (is.null(vars)) {
error_ind <- TRUE
msg <- "A value must be provided for the vars (response variable names) argument.\n"
error_vec <- c(error_vec, msg)
}
# If a value was not provided for the subpops (subpopulation names) argument,
# assign the value "All_Sites" to the subpops argument and create a factor
# named "All_Sites" in the dframe data frame that takes the value "All Sites"
if (is.null(subpops)) {
subpops <- "All_Sites"
dframe$All_Sites <- "All Sites"
dframe$All_Sites <- factor(dframe$All_Sites)
}
# Check input arguments
temp <- input_check(dframe, design_names, NULL, vars, NULL, NULL, subpops,
sizeweight, fpc, popsize, vartype, jointprob,
conf = 95,
error_ind = error_ind, error_vec = error_vec
)
dframe <- temp$dframe
vars <- temp$vars_cont
subpops <- temp$subpops
popsize <- temp$popsize
vartype <- temp$vartype
jointprob <- temp$jointprob
error_ind <- temp$error_ind
error_vec <- temp$error_vec
# As necessary, output a message indicating that error messages were generated
# during execution of the program
if (error_ind) {
error_vec <<- error_vec
if (length(error_vec) == 1) {
message("During execution of the program, an error message was generated. The error \nmessage is stored in a vector named 'error_vec'. Enter the following command \nto view the error message: errorprnt()\n")
} else {
message(paste("During execution of the program,", length(error_vec), "error messages were generated. The error \nmessages are stored in a vector named 'error_vec'. Enter the following \ncommand to view the error messages: errorprnt()\n"))
}
if (warn_ind) {
warn_df <<- warn_df
if (nrow(warn_df) == 1) {
message("During execution of the program, a warning message was generated. The warning \nmessage is stored in a data frame named 'warn_df'. Enter the following command \nto view the warning message: warnprnt()\n")
} else {
message(paste("During execution of the program,", nrow(warn_df), "warning messages were generated. The warning \nmessages are stored in a data frame named 'warn_df'. Enter the following \ncommand to view the warning messages: warnprnt() \nTo view a subset of the warning messages (say, messages number 1, 3, and 5), \nenter the following command: warnprnt(m=c(1,3,5))\n"))
}
}
stop("See the preceding message(s).")
}
# Assign a logical value to the indicator variable for a stratified sample
stratum_ind <- !is.null(stratumID)
# For a stratified sample, remove strata that contain a single site
if (stratum_ind) {
dframe[, stratumID] <- factor(dframe[, stratumID])
stratum_levels <- levels(dframe[, stratumID])
nstrata <- length(stratum_levels)
ind <- FALSE
for (i in 1:nstrata) {
tst <- dframe[, stratumID] == stratum_levels[i]
if (sum(tst) == 1) {
warn_ind <- TRUE
warn <- paste0("The stratum named \"", stratum_levels[i], "\" contains a single value and was removed from the analysis.\n")
act <- "Stratum was removed from the analysis.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = NA,
subpop = NA, indicator = NA, stratum = NA, warning = I(warn), action = I(act)
))
dframe <- dframe[!tst, ]
ind <- TRUE
}
}
if (ind) {
dframe[, stratumID] <- factor(dframe[, stratumID])
stratum_levels <- levels(dframe[, stratumID])
nstrata <- length(stratum_levels)
dframe <- droplevels(dframe)
}
}
# Assign a logical value to the indicator variable for a two-stage sample
cluster_ind <- !is.null(clusterID)
# Create the survey design object
design <- survey_design(
dframe, siteID, weight, stratum_ind, stratumID, cluster_ind, clusterID,
weight1, sizeweight, sweight, sweight1, fpcfactor_ind, fpcsize, Ncluster,
stage1size, vartype, jointprob
)
# If popsize is not equal to NULL, then call either the postStratify or
# calibrate function, as appropriate
if (!is.null(popsize)) {
if (all(class(popsize) %in% c("data.frame", "table", "xtabs"))) {
if ("data.frame" %in% class(popsize)) {
pnames <- names(popsize)[-ncol(popsize)]
} else {
pnames <- names(dimnames(popsize))
}
design <- postStratify(design, make.formula(pnames), popsize)
} else {
cnames <- cal_names(make.formula(names(popsize)), design)
pop_totals <- numeric(length(cnames))
names(pop_totals) <- cnames
pop_totals[1] <- sum(popsize[[1]])
k <- 2
for (i in names(popsize)) {
temp <- popsize[[i]]
for (j in 2:length(temp)) {
pop_totals[k] <- temp[j]
k <- k + 1
}
}
design <- calibrate(design, make.formula(names(popsize)), pop_totals)
}
}
# If popsize is not equal to NULL and vartype equals "Local", then assign
# adjusted weights to the appropriate weight variable(s) in the
# design$variables data frame
if (!is.null(popsize) && vartype == "Local") {
if (cluster_ind) {
design$variables$wgt2 <- weights(design) / design$variables$wgt1
} else {
design$variables$wgt <- weights(design)
}
}
# Create the contsum (results) data frame
contsum <- NULL
# Loop through all subpopulations (domains)
for (itype in subpops) {
# Identify levels of the subpopulation
lev_itype <- levels(dframe[, itype])
nlev_itype <- length(lev_itype)
# Check whether the number of levels of the subpopulation is greater than one
if (nlev_itype == 1) {
warn_ind <- TRUE
warn <- paste("Population type", itype, "contains a single subpopulation. \nNo CDF tests could be performed\n")
act <- "None.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = I(itype), subpop = NA, indicator = NA,
stratum = NA, warning = I(warn), action = I(act)
))
next
}
# Loop through all response variables (vars)
for (ivar in vars) {
# Determine the set of upper bounds for the response variable
temp <- dframe[!is.na(dframe[, ivar]), ivar]
bounds <- sort(temp)[floor(seq(length(temp) / nclass, length(temp),
length = nclass
))]
design$variables$colvar <- cut(dframe[, ivar], c(-Inf, bounds))
# Loop through all combinations of levels of the subpopulation
# Begin the loop for the the first level of the combination
for (isubpop1 in 1:(nlev_itype - 1)) {
# Select sites in the first level
subpop1_ind <- dframe[, itype] == lev_itype[isubpop1]
# Determine whether the level is empty
if (all(is.na(dframe[subpop1_ind, ivar]))) {
warn_ind <- TRUE
warn <- paste("Subpopulation", lev_itype[isubpop1], "of population type", itype, "for indicator", ivar, "\ncontains no data.\n")
act <- "None.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = I(itype),
subpop = I(lev_itype[isubpop1]), indicator = ivar, stratum = NA,
warning = I(warn), action = I(act)
))
next
}
# Determine whether the level contains a single value
if (sum(!is.na(dframe[subpop1_ind, ivar])) == 1) {
warn_ind <- TRUE
warn <- paste("Subpopulation", lev_itype[isubpop1], "of population type", itype, "for indicator", ivar, "\ncontains a single value. No analysis was performed.\n")
act <- "None.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = I(itype),
subpop = I(lev_itype[isubpop1]), indicator = ivar, stratum = NA,
warning = I(warn), action = I(act)
))
next
}
# Begin the loop for the second level of the combination
for (isubpop2 in (isubpop1 + 1):nlev_itype) {
# Select sites in the second level
subpop2_ind <- dframe[, itype] == lev_itype[isubpop2]
# Determine whether level is empty
if (all(is.na(dframe[subpop2_ind, ivar]))) {
warn_ind <- TRUE
warn <- paste("Subpopulation", lev_itype[isubpop2], "of population type", itype, "for indicator", ivar, "\ncontains no data.\n")
act <- "None.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = I(itype),
subpop = I(lev_itype[isubpop2]), indicator = ivar, stratum = NA,
warning = I(warn), action = I(act)
))
next
}
# Determine whether the level contains a single value
if (sum(!is.na(dframe[subpop2_ind, ivar])) == 1) {
warn_ind <- TRUE
warn <- paste("Subpopulation", lev_itype[isubpop2], "of population type", itype, "for indicator", ivar, "\ncontains a single value. No analysis was performed.\n")
act <- "None.\n"
warn_df <- rbind(warn_df, data.frame(
func = I(fname), subpoptype = I(itype),
subpop = I(lev_itype[isubpop2]), indicator = ivar, stratum = NA,
warning = I(warn), action = I(act)
))
next
}
# Perform the CDF test for the response variable
rowvar <- rep(NA, nrow(dframe))
rowvar[subpop1_ind] <- "Subpopulation 1"
rowvar[subpop2_ind] <- "Subpopulation 2"
design$variables$rowvar <- factor(rowvar,
levels = c("Subpopulation 1", "Subpopulation 2")
)
var_totals <- NULL
var_means <- NULL
if (vartype == "Local") {
warn_vec <- c(itype, lev_itype[isubpop1], lev_itype[isubpop2], ivar)
if (testname %in% c("Wald", "adjWald")) {
temp <- cdftest_localmean_total(
design, design_names, warn_ind, warn_df, warn_vec
)
var_totals <- temp$varest
warn_ind <- temp$warn_ind
warn_df <- temp$warn_df
} else {
temp <- cdftest_localmean_prop(
design, design_names, warn_ind, warn_df, warn_vec
)
var_means <- temp$varest
warn_ind <- temp$warn_ind
warn_df <- temp$warn_df
}
}
temp <- svychisq_localmean(
~ rowvar + colvar, design, testname, vartype, var_totals, var_means
)
# Assign estimates for the response variable to the contsum data frame
contsum <- rbind(
contsum,
data.frame(
Type = itype,
Subpopulation_1 = lev_itype[isubpop1],
Subpopulation_2 = lev_itype[isubpop2],
Indicator = ivar,
statistic = temp$statistic,
Degrees_of_Freedom_1 = temp$parameter[1],
Degrees_of_Freedom_2 = temp$parameter[2],
p_Value = temp$p.value
)
)
# End of the loops for combinations of levels of the subpopulation
}
}
# End of the loop for response variables
}
# End of the loop for subpopulations
}
# Assign a value for the column in the output data frame that contains test
# statistic values
if (testname == "Wald") {
names(contsum)[5] <- "Wald Statistic"
} else if (testname == "adjWald") {
names(contsum)[5] <- "Adjusted Wald Statistic"
} else if (testname == "Chisq") {
names(contsum)[5] <- "Rao-Scott First Order Statistic"
} else {
names(contsum)[5] <- "Rao-Scott Second Order Statistic"
}
# As necessary, modify dimension names
row.names(contsum) <- 1:nrow(contsum)
if (testname == "Chisq") {
contsum <- contsum[, -7]
names(contsum)[6] <- "Degrees_of_Freedom"
}
# As necessary, output a message indicating that warning messages were generated
# during execution of the program
if (warn_ind) {
warn_df <<- warn_df
if (nrow(warn_df) == 1) {
message("During execution of the program, a warning message was generated. The warning \nmessage is stored in a data frame named 'warn_df'. Enter the following command \nto view the warning message: warnprnt()\n")
} else {
message(paste("During execution of the program,", nrow(warn_df), "warning messages were generated. The warning \nmessages are stored in a data frame named 'warn_df'. Enter the following \ncommand to view the warning messages: warnprnt() \nTo view a subset of the warning messages (say, messages number 1, 3, and 5), \nenter the following command: warnprnt(m=c(1,3,5))\n"))
}
}
# Return the data frame
contsum
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.