# cleandat: Clean (spatio)temporal data matrices to make them ready for... In wsyn: Wavelet Approaches to Studies of Synchrony in Ecology and Other Fields

## Description

A data cleaning function for optimal Box-Cox transformation, detrending, standarizing variance, de-meaning

## Usage

 `1` ```cleandat(dat, times, clev, lambdas = seq(-10, 10, by = 0.01), mints = NA) ```

## Arguments

 `dat` A locations x time data matrix, or a time series vector (for 1 location) `times` The times of measurement, spacing 1 `clev` The level of cleaning to do, 1 through 5. See details. `lambdas` A vector of lambdas to test for optimal Box-Cox transformation, if Box-Cox is performed. Ignored for `clev<4`. Defaults to seq(-10,10, by=0.01). See details. `mints` If `clev` is 4 or 5, then time series are shifted to have this minimum value before Box-Cox transformation. Default NA means use the smallest difference between consecutive, distinct sorted values. NaN means perform no shift.

## Details

NAs, Infs, etc. in `dat` trigger an error. If `clev==1`, time series are (individually) de-meaned. If `clev==2`, time series are (individually) linearly detrended and de-meaned. If `clev==3`, time series are (individually) linearly detrended and de-meaned, and variances are standardized to 1. If `clev==4`, an optimal Box-Cox normalization procedure is applied jointly to all time series (so the same Box-Cox transformation is applied to all time series after they are individually shifted depending on the value of `mints`). Transformed time series are then individually linearly detrended, de-meaned, and variances are standardized to 1. If `clev==5`, an optimal Box-Cox normalization procedure is applied to each time series individually (again after individually shifting according to `mints`), and transformed time series are then individually linearly detrended, de-meaned, and variances are standardized to 1. Constant time series and perfect linear trends trigger an error for `clev>=3`. If `clev>=4` and the optimal `lambda` for one or more time series is a boundary case or if there is more than one optimal lambda, it triggers a warning. A wider range of `lambda` should be considered in the former case.

## Value

`cleandat` returns a list containing the cleaned data, `clev`, and the optimal lambdas from the Box-Cox procedure (`NA` for `clev<4`, see details).

## Author(s)

Jonathan Walter, jaw3es@virginia.edu; Lawrence Sheppard, lwsheppard@ku.edu; Daniel Reuman, reuman@ku.edu; Lei Zhao, lei.zhao@cau.edu.cn

## References

Box, GEP and Cox, DR (1964) An analysis of transformations (with discussion). Journal of the Royal Statistical Society B, 26, 211–252.

Venables, WN and Ripley, BD (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Sheppard, LW, et al. (2016) Changes in large-scale climate alter spatial synchrony of aphid pests. Nature Climate Change. DOI: 10.1038/nclimate2881

`wt`, `wmf`, `wpmf`, `coh`, `wlm`, `wlmtest`, `clust`, `browseVignettes("wsyn")`
 ```1 2 3 4 5 6 7``` ```times<-1:100 dat<-rnorm(100) res1<-cleandat(dat,times,1) #this removes the mean res2<-cleandat(dat,times,2) #detrends and removes the mean res3<-cleandat(dat,times,3) #variances also standardized res4<-cleandat(dat,times,4) #also joint Box-Cox applied res5<-cleandat(dat,times,5) #1-3, also indiv Box-Cox ```