EnvStats-package: Package for Environmental Statistics, Including US EPA...

Description Details Author(s) References Examples

Description

A comprehensive R package for environmental statistics and the successor to the S-PLUS module EnvironmentalStats for S-PLUS (first released in April, 1997). EnvStats provides a set of powerful functions for graphical and statistical analyses of environmental data, with a focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. It includes major environmental statistical methods found in the literature and regulatory guidance documents, and extensive help that explains what these methods do, how to use them, and where to find them in the literature. It also includes numerous built-in data sets from regulatory guidance documents and environmental statistics literature, and scripts reproducing analyses presented in the User's manual: EnvStats: An R Package for Environmental Statistics (Millard, 2013, http://www.springer.com/book/9781461484554).

For a complete list of functions and datasets, you can do any of the following:

Note: The names of all EnvStats functions start with a lowercase letter, and the names of all EnvStats datasets and data objects start an uppercase letter. You can type newsEnvStats() at the R command prompt for the latest news for the EnvStats package.

Details

Package: EnvStats
Type: Package
Version: 2.3.0
Date: 2017-10-09
License: GPL (>=3)
LazyLoad: yes

A companion file EnvStats-manual.pdf containing a listing of all the current help files is located on the R CRAN web site at https://cran.r-project.org/package=EnvStats/EnvStats.pdf and also in the doc subdirectory of the directory where the EnvStats package was installed. For example, if you installed R under Windows, this file might be located in the directory C:\Program Files\R-*.**.*\library\EnvStats\doc, where *.**.* denotes the version of R you are using (e.g., 3.3.4) or in the directory C:\Users\Name\Documents\R\win-library\*.**.*\EnvStats\doc, where Name denotes your user name on the Windows operating system.

EnvStats comes with companion scripts, located in the scripts subdirectory of the directory where the package was installed. One set of scripts lets you reproduce the examples in the User's Manual. There are also scripts that let you reproduce examples from US EPA guidance documents.

See the References section below for documentation for the predecessor to EnvStats, EnvironmentalStats for S-PLUS for Windows.

Features of EnvStats include:

Author(s)

Steven P. Millard

Maintainer: Steven P. Millard <[email protected]>

References

Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. http://www.springer.com/book/9781461484554.

Millard, S.P. (2002). EnvironmentalStats for S-PLUS: User's Manual for Version 2.0. Second Edition. Springer-Verlag, New York.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
  # Look at plots and summary statistics for the TcCB data given in 
  # USEPA (1994b), (the data are stored in EPA.94b.tccb.df). 
  # Arbitrarily set the one censored observation to the censoring level. 
  # Group by the variable Area.

  EPA.94b.tccb.df
  #    TcCB.orig   TcCB Censored      Area
  #1        0.22   0.22    FALSE Reference
  #2        0.23   0.23    FALSE Reference
  #...
  #46       1.20   1.20    FALSE Reference
  #47       1.33   1.33    FALSE Reference
  #48      <0.09   0.09     TRUE   Cleanup
  #49       0.09   0.09    FALSE   Cleanup
  #...
  #123     51.97  51.97    FALSE   Cleanup
  #124    168.64 168.64    FALSE   Cleanup


  # First plot the data
  #--------------------
  dev.new()
  stripChart(TcCB ~ Area, data = EPA.94b.tccb.df, 
    xlab = "Area", ylab = "TcCB (ppb)")
  mtext("TcCB Concentrations by Area", line = 3, cex = 1.25, font = 2)

  dev.new()
  stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, 
    xlab = "Area", ylab = expression(paste(log[10], " [ TcCB (ppb) ]")))
  mtext(expression(paste(log[10], "(TcCB) Concentrations by Area")), 
    line = 3, cex = 1.25, font = 2)

  #--------------------------------------------------------------------

  # Now compute summary statistics
  #-------------------------------
  
  sum(EPA.94b.tccb.df$Censored) 
  #[1] 1 

  with(EPA.94b.tccb.df, TcCB[Censored])
  #0.09 

  # Summary statistics will treat the one censored value 
  # as assuming the detection limit.

  summaryFull(TcCB ~ Area, data = EPA.94b.tccb.df)
  #                             Cleanup  Reference
  #N                             77       47      
  #Mean                           3.915    0.5985 
  #Median                         0.43     0.54   
  #10% Trimmed Mean               0.6846   0.5728 
  #Geometric Mean                 0.5784   0.5382 
  #Skew                           7.717    0.9019 
  #Kurtosis                      62.67     0.132  
  #Min                            0.09     0.22   
  #Max                          168.6      1.33   
  #Range                        168.5      1.11   
  #1st Quartile                   0.23     0.39   
  #3rd Quartile                   1.1      0.75   
  #Standard Deviation            20.02     0.2836 
  #Geometric Standard Deviation   3.898    1.597  
  #Interquartile Range            0.87     0.36   
  #Median Absolute Deviation      0.3558   0.2669 
  #Coefficient of Variation       5.112    0.4739

  summaryStats(TcCB ~ Area, data = EPA.94b.tccb.df, digits = 1)
  #           N Mean   SD Median Min   Max
  #Cleanup   77  3.9 20.0    0.4 0.1 168.6
  #Reference 47  0.6  0.3    0.5 0.2   1.3

  #----------------------------------------------------------------

  # Compute Shapiro-Wilk Goodness-of-Fit statistic for the 
  # Reference Area TcCB data assuming a lognormal distribution
  #-----------------------------------------------------------
  
  sw.list <- gofTest(TcCB ~ 1, data = EPA.94b.tccb.df, 
    subset = Area == "Reference", dist = "lnorm")
  sw.list

  # Results of Goodness-of-Fit Test
  # -------------------------------
  #
  # Test Method:                     Shapiro-Wilk GOF
  #
  # Hypothesized Distribution:       Lognormal
  #
  # Estimated Parameter(s):          meanlog = -0.6195712
  #                                  sdlog   =  0.4679530
  #
  # Estimation Method:               mvue
  #
  # Data:                            TcCB
  #
  # Subset With:                     Area == "Reference"
  #
  # Data Source:                     EPA.94b.tccb.df
  #
  # Sample Size:                     47
  #
  # Test Statistic:                  W = 0.978638
  #
  # Test Statistic Parameter:        n = 47
  #
  # P-value:                         0.5371935
  #
  # Alternative Hypothesis:          True cdf does not equal the
  #                                  Lognormal Distribution.

  #----------

  # Plot results of GOF test
  dev.new()
  plot(sw.list)

  #----------------------------------------------------------------

  # Based on the Reference Area data, estimate 90th percentile 
  # and compute a 95% confidence limit for the 90th percentile 
  # assuming a lognormal distribution.
  #------------------------------------------------------------

  with(EPA.94b.tccb.df, 
    eqlnorm(TcCB[Area == "Reference"], p = 0.9, ci = TRUE))

  # Results of Distribution Parameter Estimation
  # --------------------------------------------
  #
  # Assumed Distribution:            Lognormal
  #
  # Estimated Parameter(s):          meanlog = -0.6195712
  #                                  sdlog   =  0.4679530
  #
  # Estimation Method:               mvue
  #
  # Estimated Quantile(s):           90'th %ile = 0.9803307
  #
  # Quantile Estimation Method:      qmle
  #
  # Data:                            TcCB[Area == "Reference"]
  #
  # Sample Size:                     47
  #
  # Confidence Interval for:         90'th %ile
  #
  # Confidence Interval Method:      Exact
  #
  # Confidence Interval Type:        two-sided
  #
  # Confidence Level:                95%
  #
  # Confidence Interval:             LCL = 0.8358791
                                     UCL = 1.2154977
  #----------

  # Cleanup
  rm(TcCB.ref, sw.list)

Example output

Attaching package: 'EnvStats'

The following objects are masked from 'package:stats':

    predict, predict.lm

The following object is masked from 'package:base':

    print.default

    TcCB.orig   TcCB Censored      Area
1        0.22   0.22    FALSE Reference
2        0.23   0.23    FALSE Reference
3        0.26   0.26    FALSE Reference
4        0.27   0.27    FALSE Reference
5        0.28   0.28    FALSE Reference
6        0.28   0.28    FALSE Reference
7        0.29   0.29    FALSE Reference
8        0.33   0.33    FALSE Reference
9        0.34   0.34    FALSE Reference
10       0.35   0.35    FALSE Reference
11       0.38   0.38    FALSE Reference
12       0.39   0.39    FALSE Reference
13       0.39   0.39    FALSE Reference
14       0.42   0.42    FALSE Reference
15       0.42   0.42    FALSE Reference
16       0.43   0.43    FALSE Reference
17       0.45   0.45    FALSE Reference
18       0.46   0.46    FALSE Reference
19       0.48   0.48    FALSE Reference
20       0.50   0.50    FALSE Reference
21       0.50   0.50    FALSE Reference
22       0.51   0.51    FALSE Reference
23       0.52   0.52    FALSE Reference
24       0.54   0.54    FALSE Reference
25       0.56   0.56    FALSE Reference
26       0.56   0.56    FALSE Reference
27       0.57   0.57    FALSE Reference
28       0.57   0.57    FALSE Reference
29       0.60   0.60    FALSE Reference
30       0.62   0.62    FALSE Reference
31       0.63   0.63    FALSE Reference
32       0.67   0.67    FALSE Reference
33       0.69   0.69    FALSE Reference
34       0.72   0.72    FALSE Reference
35       0.74   0.74    FALSE Reference
36       0.76   0.76    FALSE Reference
37       0.79   0.79    FALSE Reference
38       0.81   0.81    FALSE Reference
39       0.82   0.82    FALSE Reference
40       0.84   0.84    FALSE Reference
41       0.89   0.89    FALSE Reference
42       1.11   1.11    FALSE Reference
43       1.13   1.13    FALSE Reference
44       1.14   1.14    FALSE Reference
45       1.14   1.14    FALSE Reference
46       1.20   1.20    FALSE Reference
47       1.33   1.33    FALSE Reference
48      <0.09   0.09     TRUE   Cleanup
49       0.09   0.09    FALSE   Cleanup
50       0.09   0.09    FALSE   Cleanup
51       0.12   0.12    FALSE   Cleanup
52       0.12   0.12    FALSE   Cleanup
53       0.14   0.14    FALSE   Cleanup
54       0.16   0.16    FALSE   Cleanup
55       0.17   0.17    FALSE   Cleanup
56       0.17   0.17    FALSE   Cleanup
57       0.17   0.17    FALSE   Cleanup
58       0.18   0.18    FALSE   Cleanup
59       0.19   0.19    FALSE   Cleanup
60       0.20   0.20    FALSE   Cleanup
61       0.20   0.20    FALSE   Cleanup
62       0.21   0.21    FALSE   Cleanup
63       0.21   0.21    FALSE   Cleanup
64       0.22   0.22    FALSE   Cleanup
65       0.22   0.22    FALSE   Cleanup
66       0.22   0.22    FALSE   Cleanup
67       0.23   0.23    FALSE   Cleanup
68       0.24   0.24    FALSE   Cleanup
69       0.25   0.25    FALSE   Cleanup
70       0.25   0.25    FALSE   Cleanup
71       0.25   0.25    FALSE   Cleanup
72       0.25   0.25    FALSE   Cleanup
73       0.26   0.26    FALSE   Cleanup
74       0.28   0.28    FALSE   Cleanup
75       0.28   0.28    FALSE   Cleanup
76       0.29   0.29    FALSE   Cleanup
77       0.31   0.31    FALSE   Cleanup
78       0.33   0.33    FALSE   Cleanup
79       0.33   0.33    FALSE   Cleanup
80       0.33   0.33    FALSE   Cleanup
81       0.34   0.34    FALSE   Cleanup
82       0.37   0.37    FALSE   Cleanup
83       0.38   0.38    FALSE   Cleanup
84       0.39   0.39    FALSE   Cleanup
85       0.40   0.40    FALSE   Cleanup
86       0.43   0.43    FALSE   Cleanup
87       0.43   0.43    FALSE   Cleanup
88       0.47   0.47    FALSE   Cleanup
89       0.48   0.48    FALSE   Cleanup
90       0.48   0.48    FALSE   Cleanup
91       0.49   0.49    FALSE   Cleanup
92       0.51   0.51    FALSE   Cleanup
93       0.51   0.51    FALSE   Cleanup
94       0.54   0.54    FALSE   Cleanup
95       0.60   0.60    FALSE   Cleanup
96       0.61   0.61    FALSE   Cleanup
97       0.62   0.62    FALSE   Cleanup
98       0.75   0.75    FALSE   Cleanup
99       0.82   0.82    FALSE   Cleanup
100      0.85   0.85    FALSE   Cleanup
101      0.92   0.92    FALSE   Cleanup
102      0.94   0.94    FALSE   Cleanup
103      1.05   1.05    FALSE   Cleanup
104      1.10   1.10    FALSE   Cleanup
105      1.10   1.10    FALSE   Cleanup
106      1.19   1.19    FALSE   Cleanup
107      1.22   1.22    FALSE   Cleanup
108      1.33   1.33    FALSE   Cleanup
109      1.39   1.39    FALSE   Cleanup
110      1.39   1.39    FALSE   Cleanup
111      1.52   1.52    FALSE   Cleanup
112      1.53   1.53    FALSE   Cleanup
113      1.73   1.73    FALSE   Cleanup
114      2.35   2.35    FALSE   Cleanup
115      2.46   2.46    FALSE   Cleanup
116      2.59   2.59    FALSE   Cleanup
117      2.61   2.61    FALSE   Cleanup
118      3.06   3.06    FALSE   Cleanup
119      3.29   3.29    FALSE   Cleanup
120      5.56   5.56    FALSE   Cleanup
121      6.61   6.61    FALSE   Cleanup
122     18.40  18.40    FALSE   Cleanup
123     51.97  51.97    FALSE   Cleanup
124    168.64 168.64    FALSE   Cleanup
dev.new(): using pdf(file="Rplots1.pdf")
[1] 1
[1] 0.09
                             Cleanup  Reference
N                             77       47      
Mean                           3.915    0.5985 
Median                         0.43     0.54   
10% Trimmed Mean               0.6846   0.5728 
Geometric Mean                 0.5784   0.5382 
Skew                           7.717    0.9019 
Kurtosis                      62.67     0.132  
Min                            0.09     0.22   
Max                          168.6      1.33   
Range                        168.5      1.11   
1st Quartile                   0.23     0.39   
3rd Quartile                   1.1      0.75   
Standard Deviation            20.02     0.2836 
Geometric Standard Deviation   3.898    1.597  
Interquartile Range            0.87     0.36   
Median Absolute Deviation      0.3558   0.2669 
Coefficient of Variation       5.112    0.4739 
           N Mean   SD Median Min   Max
Cleanup   77  3.9 20.0    0.4 0.1 168.6
Reference 47  0.6  0.3    0.5 0.2   1.3

Results of Goodness-of-Fit Test
-------------------------------

Test Method:                     Shapiro-Wilk GOF

Hypothesized Distribution:       Lognormal

Estimated Parameter(s):          meanlog = -0.6195712
                                 sdlog   =  0.4679530

Estimation Method:               mvue

Data:                            TcCB

Subset With:                     Area == "Reference"

Data Source:                     EPA.94b.tccb.df

Sample Size:                     47

Test Statistic:                  W = 0.9786379

Test Statistic Parameter:        n = 47

P-value:                         0.5371935

Alternative Hypothesis:          True cdf does not equal the
                                 Lognormal Distribution.
dev.new(): using pdf(file="Rplots2.pdf")

Results of Distribution Parameter Estimation
--------------------------------------------

Assumed Distribution:            Lognormal

Estimated Parameter(s):          meanlog = -0.6195712
                                 sdlog   =  0.4679530

Estimation Method:               mvue

Estimated Quantile(s):           90'th %ile = 0.9803307

Quantile Estimation Method:      qmle

Data:                            TcCB[Area == "Reference"]

Sample Size:                     47

Confidence Interval for:         90'th %ile

Confidence Interval Method:      Exact

Confidence Interval Type:        two-sided

Confidence Level:                95%

Confidence Interval:             LCL = 0.8358791
                                 UCL = 1.2154977

Warning message:
In rm(TcCB.ref, sw.list) : object 'TcCB.ref' not found

EnvStats documentation built on July 15, 2018, 9:03 a.m.