This function creates an object of class spsurvey.analysis that contains all of the information necessary to use the analysis functions in the spsurvey library.
1 2 3 4 5 6  spsurvey.analysis(sites=NULL, subpop=NULL, design=NULL, data.cat=NULL,
data.cont=NULL, siteID=NULL, wgt=NULL, sigma=NULL, var.sigma=NULL,
xcoord=NULL, ycoord=NULL, stratum=NULL, cluster=NULL, wgt1=NULL, xcoord1=NULL,
ycoord1=NULL, popsize=NULL, popcorrect=FALSE, pcfsize=NULL, N.cluster=NULL,
stage1size=NULL, support=NULL, sizeweight=FALSE, swgt=NULL, swgt1=NULL,
vartype="Local", conf=95, pctval=c(5,10,25,50,75,90,95))

sites 
a data frame consisting of two variables: the first variable is site IDs and the second variable is a logical vector indicating which sites to use in the analysis. If this data frame is not provided, then the data frame will be created, where (1) site IDs are obtained either from the design argument, the siteID argument, or both (when siteID is a formula); and (2) a variable named use.sites that contains the value TRUE for all sites is created. The default is NULL. 
subpop 
a data frame describing sets of populations and subpopulations for which estimates will be calculated. The first variable is site IDs and each subsequent variable identifies a Type of population, where the variable name is used to identify Type. A Type variable identifies each site with one of the subpopulations of that Type. If this data frame is not provided, then the data frame will be created, where (1) site IDs are obtained either from the design argument, the siteID argument, or both (when siteID is a formula); and (2) a single Type variable named all.sites that contains the value "All Sites" for all sites is created. The default is NULL. 
design 
a data frame consisting of design variables. If variable names
are provided as formulas in the corresponding arguments, then the
formulas are interpreted using this data frame. If this data frame is
not provided, then the data frame will be created from inputs to the
design variables in the argument list. The default is NULL. If
variable names are not provided as formulas, then variables should be
named as follows: 
data.cat 
a data frame of categorical response variables. The first variable is site IDs. Subsequent variables are response variables. Missing data (NA) is allowed. The default is NULL. 
data.cont 
a data frame of continuous response variables. The first variable is site IDs. Subsequent variables are response variables. Missing data (NA) is allowed. The default is NULL. 
siteID 
site IDs. This variable can be input directly or as a formula and must be supplied either as this argument or in the design data frame. The default is NULL. 
wgt 
the final adjusted weight (inverse of the sample inclusion probability) for each site, which is either the weight for a singlestage sample or the stage two weight for a twostage sample. The default is NULL. 
sigma 
measurement error variance. This variable must be a vector containing a value for each response variable and must have the names attribute set to identify the response variable names. Missing data (NA) is allowed. The default is NULL. 
var.sigma 
variance of the measurement error variance. This variable must be a vector containing a value for each response variable and must have the names attribute set to identify the response variable names. Missing data (NA) is allowed. The default is NULL. 
xcoord 
xcoordinate for location for each site, which is either the xcoordinate for a singlestage sample or the stage two xcoordinate for a twostage sample. The default is NULL. 
ycoord 
ycoordinate for location for each site, which is either the ycoordinate for a singlestage sample or the stage two ycoordinate for a twostage sample. The default is NULL. 
stratum 
the stratum codes. This variable can be input directly or as a formula. The default is NULL. 
cluster 
the stage one sampling unit (primary sampling unit or cluster) codes. This variable can be input directly or as a formula. The default is NULL. 
wgt1 
the final adjusted stage one weights. This variable can be input directly or as a formula. The default is NULL. 
xcoord1 
the stage one xcoordinates for location. This variable can be input directly or as a formula. The default is NULL. 
ycoord1 
the stage one ycoordinates for location. This variable can be input directly or as a formula. The default is NULL. 
popsize 
known size of the resource, which is used to perform ratio
adjustment to estimators expressed using measurement units for the
resource. For a finite resource, this argument is either the total number
of sampling units or the known sum of sizeweights. For an extensive
resource, this argument is the measure of the resource, i.e., either known
total length for a linear resource or known total area for an areal
resource. The argument must be in the form of a list containing an
element for each population Type in the subpop data frame, where NULL is a
valid choice for a population Type. The list must be named using the
column names for the population Types in subpop. If a population Type
doesn't contain subpopulations, then each element of the list is either a
single value for an unstratified sample or a vector containing a value for
each stratum for a stratified sample, where elements of the vector are
named using the stratum codes. If a population Type contains
subpopulations, then each element of the list is a list containing an
element for each subpopulation, where the list is named using the
subpopulation names. The element for each subpopulation will be either a
single value for an unstratified sample or a named vector of values for a
stratified sample. The default is NULL. 
popcorrect 
a logical value that indicates whether finite or continuous population correction factors should be employed during variance estimation, where TRUE = use the correction factor and FALSE = do not use the correction factor. The default is FALSE. To employ the correction factor for a singlestage sample, values must be supplied for argument pcfsize and for the support variable of the design argument. To employ the correction factor for a twostage sample, values must be supplied for arguments N.cluster and stage1size, and for the support variable of the design argument. 
pcfsize 
size of the resource, which is required for calculation of finite and continuous population correction factors for a singlestage sample. For a stratified sample this argument must be a vector containing a value for each stratum and must have the names attribute set to identify the stratum codes. The default is NULL. 
N.cluster 
the number of stage one sampling units in the resource, which is required for calculation of finite and continuous population correction factors for a twostage sample. For a stratified sample this variable must be a vector containing a value for each stratum and must have the names attribute set to identify the stratum codes. The default is NULL. 
stage1size 
size of the stage one sampling units of a twostage sample, which is required for calculation of finite and continuous population correction factors for a twostage sample and must have the names attribute set to identify the stage one sampling unit codes. For a stratified sample, the names attribute must be set to identify both stratum codes and stage one sampling unit codes using a convention where the two codes are separated by the & symbol, e.g., "Stratum 1&Cluster 1". The default is NULL. 
support 
the support value for each site  the value one (1) for a site from a finite resource or the measure of the sampling unit associated with a site from an extensive resource, which is required for calculation of finite and continuous population correction factors. This variable can be input directly or as a formula. The default is NULL. 
sizeweight 
a logical value that indicates whether sizeweights should be used in the analysis, where TRUE = use the sizeweights and FALSE = do not use the sizeweights. The default is FALSE. 
swgt 
the sizeweight for each site, which is the stage two sizeweight for a twostage sample. This variable can be input directly or as a formula. The default is NULL. 
swgt1 
the stage one sizeweight for each site. This variable can be input directly or as a formula. The default is NULL. 
vartype 
the choice of variance estimator, where "Local" = local mean estimator and "SRS" = SRS estimator. The default is "Local". 
conf 
the confidence level. The default is 95%. 
pctval 
the set of values at which percentiles are estimated. The default set is: {5, 10, 25, 50, 75, 90, 95}. 
Value is a list of class spsurvey.analysis. Only those sites indicated by the logical variable in the sites data frame are retained in the output. The sites, subpop, and design data frames will always exist in the output. At least one of the data.cat and data.cont data frames will exist. Depending upon values of the input variables, other elements in the output may be NULL. The list is composed of the following components:
sites
 the sites data frame
subpop
 the subpop data frame
design
 the design data frame
data.cat
 the data.cat data frame
data.cont
 the data.cont data frame
sigma
 measurement error variance
var.sigma
 variance of the estimated measurement error
variance
stratum.ind
 a logical value that indicates whether the sample
is stratified, where TRUE = a stratified sample and FALSE = not a
stratified sample
cluster.ind
 a logical value that indicates whether the sample
is a twostage sample, where TRUE = a twostage sample and FALSE = not a
twostage sample
popsize
 the known size of the resource
pcfactor.ind
 a logical value that indicates whether the
population correction factor is used during variance estimation, where
TRUE = use the population correction factor and FALSE = do not use the
factor
pcfsize
 size of the resource, which is required for
calculation of finite and continuous population correction factors for a
singlestage sample
N.cluster
 the number of stage one sampling units in the
resource
stage1size
 the known size of the stage one sampling units
swgt.ind
 a logical value that indicates whether the sample is
a sizeweighted sample, where TRUE = a sizeweighted sample and FALSE =
not a sizeweighted sample
vartype
 the choice of variance estimator, where "Local" =
local mean estimator and "SRS" = SRS estimator
conf
 the confidence level
pctval
 the set of values at which percentiles are estimated,
where the default set is: 5, 25, 50, 75, 95
Tom Kincaid Kincaid.Tom@epa.gov
DiazRamos, S., D.L. Stevens, Jr., and A.R. Olsen. (1996). EMAP Statistical Methods Manual. EPA/620/R96/XXX. Corvallis, OR: U.S. Environmental Protection Agency, Office of Research and Development, National Health Effects and Environmental Research Laboratory, Western Ecology Division.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  # Categorical variable example:
mysiteID < paste("Site", 1:100, sep="")
mysites < data.frame(siteID=mysiteID, Active=rep(TRUE, 100))
mysubpop < data.frame(siteID=mysiteID, All.Sites=rep("All Sites", 100),
Resource.Class=rep(c("Good","Poor"), c(55,45)))
mydesign < data.frame(siteID=mysiteID, wgt=runif(100, 10,
100), xcoord=runif(100), ycoord=runif(100), stratum= rep(c("Stratum1",
"Stratum2"), 50))
mydata.cat < data.frame(siteID=mysiteID, CatVar= rep(c("north", "south",
"east", "west"), 25))
mypopsize < list(All.Sites=c(Stratum1=3500, Stratum2=2000),
Resource.Class=list(Good=c(Stratum1=2500, Stratum2=1500),
Poor=c(Stratum1=1000, Stratum2=500)))
spsurvey.analysis(sites=mysites, subpop=mysubpop, design=mydesign,
data.cat=mydata.cat, popsize=mypopsize)
# Continuous variable example  including deconvolution estimates:
mydesign < data.frame(ID=mysiteID, wgt=runif(100, 10, 100),
xcoord=runif(100), ycoord=runif(100), stratum=rep(c("Stratum1",
"Stratum2"), 50))
ContVar < rnorm(100, 10, 1)
mydata.cont < data.frame(siteID=mysiteID, ContVar=ContVar,
ContVar.1=ContVar + rnorm(100, 0, sqrt(0.25)),
ContVar.2=ContVar + rnorm(100, 0, sqrt(0.50)))
mysigma < c(ContVar=NA, ContVar.1=0.25, ContVar.2=0.50)
spsurvey.analysis(sites=mysites, subpop=mysubpop, design=mydesign,
data.cont=mydata.cont, siteID=~ID, sigma=mysigma,
popsize=mypopsize)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.