fitswavecav: Fit seasonal wave and continuous ancillary data for trend...
In seawaveQ: SEAWAVE-Q Model

Description Usage Arguments Format Details Value Note Author(s) References See Also Examples

Function to prepare data and fit the seawaveQ model.

fitswavecav(
  cdat,
  cavdat,
  tanm = "trend1",
  pnames,
  yrstart = 0,
  yrend = 0,
  tndbeg = 0,
  tndend = 0,
  iwcav = c("none"),
  dcol = "dates",
  qwcols = c("R", "P"),
  mclass = 1,
  numk = 4,
  alpha = 0.1,
  bootRCS = FALSE,
  nboot = 1000,
  plotfile = FALSE,
  textfile = FALSE
)

`cdat`	is the concentration data
`cavdat`	is the continuous (daily) ancillary data
`tanm`	is a character identifier that names the trend analysis run. It is used to label output files.
`pnames`	are the parameters (water-quality constituents) to analyze (omit the starting character, for example for sulfate data indicated by P00945, enter "00945").
`yrstart`	is the starting year of the analysis (treated as January 1 of that year). Zero means the start date will be determined by the start date of cavdat, the continuous ancillary data.
`yrend`	is the ending year of the analysis (treated as December 31 of that year). Zero means the end date will be determined by the end date of cavdat, the continuous ancillary data.
`tndbeg`	is the beginning (in whole or decimal years) of the trend period. Zero means the begin date will be the beginning of the concentration data, cdat.
`tndend`	is the end of the trend (treated as December 31 of that year). Zero means the end date will be the end of the concentration data, cdat.
`iwcav`	is a character vector indicating which continuous ancillary variables to include, if none are used for analysis, use iwcav=c("none").
`dcol`	is the column name for the dates, should be the same for both cdat and cavdat
`qwcols`	is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter).
`mclass`	indicates the class of model to use. A class 1 model is the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model.
`numk`	is the number of knots in the restricted cubic spline model. The default is 4, and the recommended number is 3–7.
`alpha`	is the significance level or alpha values for statistical significance and confidence intervals.
`bootRCS`	is a logical value indicating whether or not to perform block bootstrapping for an attained significance level for the trend with restricted cubic splines. No bootstrapping is performed for the linear trend model.
`nboot`	is the number of bootstrap replicates. A large number, 10,000, is recommended, but this takes a long time.
`plotfile`	is by default FALSE. True will write pdf files of plots to the user's file system.
`textfile`	is by default FALSE. True will write text output files to the user's file system. These files are useful for detailed model comparisons, documenting session information, and for model archives.

The data frame returned as the first element of the output list has one row for each chemical analyzed and the number of columns depends on the number of continuous ancillary variables used. The general format is as follows:

pname	character	parameter analyzed
mclass	numeric	a value of 1 or 2
jmod	numeric	the choice of pulse input function, an integer 1--14
hlife	numeric	the model half-life in months, an integer, 1 to 4 months
cmaxt	numeric	the decimal season of maximum concentration
scl	numeric	the scale factor from the `survreg.object`
loglik	numeric	the log-likelihood for the model
cint	numeric	coefficient for model intercept
cwave	numeric	coefficient for the seasonal wave
ctnd[alpahnumeric]	numeric	coefficient(s) for the trend component(s) of model
c[alphanumeric]	numeric	0 or more coefficients for the continuous ancillary variables
seint	numeric	standard error for the intercept
sewave	numeric	standard error for the seasonal wave
setnd[alphanumeric]	numeric	standard error for the trend component(s)
se[alphanumeric]	numeric	0 or more standard errors for the continuous ancillary variables
pvaltnd[alphanumeric]	numeric	the p-value for the trend component(s)

The data frame returned as the sixth element of the output list has one row for each chemical analyzed. The general format for linear models is described in pesticideTrendCalcs. The format for restricted cubic spline models is as described as follows:

pname	character	parameter analyzed
mclass	numeric	A value of 1 or 2
baseConc	numeric	the concentration at the beginning of the trend period
endCon	numeric	the concentration at the end of the trend period
rcsctndPpor	numeric	the concentration trend in percent over the trend period
rcsctndOrigPORPercentBase	numeric	the conc. trend in original units over period of record (calc. based on percent per year and base conc.)
pvalrcstnd	numeric	p-value, attained significance level, based on bootstrapping
ctndlklhd	numeric	trend likelihood

Fits the seawaveQ model (Vecchia and others, 2008) using a seasonal wave and continuous ancillary variables (streamflow anomalies and other continuous variables such as conductivity or sediment) to model water quality. The version in the 2.0.0 update to the R package has an option to use restricted cubic splines as a more flexible definition of the temporal trend.

A PDF file containing plots of the data and modeled concentration, a text file containing a summary of the survival regression call for each model selected, and a list. The first element of the list is a data frame described under format. The second element of the list is the summary of the survival regression call. The third element is the observed concentration data (censored and uncensored). The fourth element is the concentration data predicted by the model. The fifth element provides summary statistics for the predicted concentrations. The sixth element is a data frame that provides a summary of the trends with the columns pname (parameter name), mclass (a value of 1, indicating a linear trend model, a value of 2 for models using restricted cubic splines), and the columns describing the trends. For linear models see pesticideTrendCalcs. For models with restricted cubic splines, see the format section. See Ryberg and York (2020) for additional details.

The assumed data format is one with columns for water-quality concentration values and a related column for qualification of those values, such as in the case of left-censored values less than a particular value. For example, a water-quality sample was collected and the laboratory analysis indicated that the concentration was less than 0.01 micrograms per liter. The USGS parameter code for simazine is 04035 (U.S. Geological Survey, 2018b). When the data are retrieved through the National Water Information System: Web Interface (https://waterdata.usgs.gov/nwis; U.S. Geological Survey, 2018a), the concentration values are in a column labeled P04035 and the qualification information, or remark codes, are in a column labeled R04035. To use this function, the argument pnames would be the unique identifier for simazine values and qualifications, 04035, and the qwcols argument would be c("R", "P") to indicate that the qualification column starts with an R and the values column starts with a P.

Other users may have data in different formats that can be modified to use with this function. For example, a user may have concentration values and qualification codes in one column, such as a column labeled simazine with the values 0.05, 0.10, <0.01, <0.01, and 0.90. In this case, the less thans and any other qualification codes should be placed in a separate column. The column names for the qualification codes and the concentration values should be the same with the exception of different beginning letters to indicate which column is which. The columns could be named Rsimazine and Psimazine. Then the argument pnames = "simazine" and the argument qwcols = c("R", "P").

Users should exercise caution when their water-quality data have multiple censoring limits and may want to recensor the data to a single censoring level. Censoring and recensoring issues are discussed in the text and Appendix 1 of Ryberg and others (2010).

None of the variations of SEAWAVE is a simple model. The model complexity increases with flow anomalies and with the addition of restricted cubic splines. As the number of parameters in the model increases or the degree of censoring increases, the sample size must also increase. See Ryberg and others (2020) for more details on sample size.

Aldo V. Vecchia and Karen R. Ryberg

Ryberg, K.R. and York, B.C., 2020, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables: U.S. Geological Survey Open-File Report 2020–1082, 25 p., with 4 appendixes.

Ryberg, K.R., Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2010, Trends in pesticide concentrations in urban streams in the United States, 1992–2008: U.S. Geological Survey Scientific Investigations Report 2010–5139, 101 p., https://pubs.usgs.gov/sir/2010/5139/.

U.S. Geological Survey, 2018a, National Water Information System: Web Interface, accessed July 7, 2018, at https://waterdata.usgs.gov/nwis.

U.S. Geological Survey, 2018b, Parameter code definition: National Water Information System: Web Interface, accessed July 18, 2018, at https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes.

Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308–1324, https://dx.doi.org/10.1111/j.1752-1688.2008.00225.x.

The functions that fitswavecav calls internally:
prepData and fitMod.

data(swData)
modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha)
myfitLinearTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, 
tanm = "myfitLinearTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, 
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), 
dcol = "dates", qwcols = c("R", "P"))
# trend model results
myfitLinearTrend[[1]]
# example regression call
myfitLinearTrend[[2]][[1]]
# first few lines of observed concentrations
head(myfitLinearTrend[[3]])
# first few lines of predicted concentrations
head(myfitLinearTrend[[4]])
# summary statistics for predicted concentrations
myfitLinearTrend[[5]]
# summary of trends
myfitLinearTrend[[6]]
myfitRCSTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, 
tanm = "myfitRCSTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, 
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), 
dcol = "dates", qwcols = c("R", "P"), mclass = 2, numk = 4, bootRCS = FALSE,
plotfile = FALSE, textfile = FALSE)