gofTest: Goodness-of-Fit Test

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/gofTest.R

Description

Perform a goodness-of-fit test to determine whether a data set appears to come from a specified probability distribution or if two data sets appear to come from the same distribution.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
gofTest(y, ...)

## S3 method for class 'formula'
gofTest(y, data = NULL, subset, 
  na.action = na.pass, ...)

## Default S3 method:
gofTest(y, x = NULL, 
  test = ifelse(is.null(x), "sw", "ks"), 
  distribution = "norm", est.arg.list = NULL, 
  alternative = "two.sided", n.classes = NULL, 
  cut.points = NULL, param.list = NULL, 
  estimate.params = ifelse(is.null(param.list), TRUE, FALSE), 
  n.param.est = NULL, correct = NULL, digits = .Options$digits, 
  exact = NULL, ws.method = "normal scores", warn = TRUE, 
  data.name = NULL, data.name.x = NULL, parent.of.data = NULL, 
  subset.expression = NULL, ...) 

Arguments

y

an object containing data for the goodness-of-fit test. In the default method, the argument y must be numeric vector of observations. In the formula method, y must be a formula of the form y ~ 1 or y ~ x. The form y ~ 1 indicates use the observations in the vector y for a one-sample goodness-of-fit test. The form y ~ x is only relevant to the case of the two-sample Kolmogorov-Smirnov test (test="ks") and indicates use the observations in the vector y as the second sample and use the observations in the vector x as the first sample. Note that for the formula method, x and y must be the same length but this is not a requirement of the test and you can use vectors of different lengths via the default method. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

data

specifies an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which gofTest is called.

subset

specifies an optional vector specifying a subset of observations to be used.

na.action

specifies a function which indicates what should happen when the data contain NAs. The default is na.pass.

x

numeric vector of values for the first sample in the case of a two-sample Kolmogorov-Smirnov goodness-of-fit test (test="ks"). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

test

character string defining which goodness-of-fit test to perform. Possible values are: "sw" (Shapiro-Wilk; the default when x is NOT supplied), "sf" (Shapiro-Francia), "ppcc" (Probability Plot Correlation Coefficient), "skew" (Zero-skew), "chisq" (Chi-squared), "ks" (Kolmogorov-Smirnov; the default when x IS supplied), and "ws" (Wilk-Shapiro test for Uniform [0, 1] distribution). When the argument x is supplied, you must set test="ks", which is what gofTest does by default.

distribution

a character string denoting the distribution abbreviation. See the help file for Distribution.df for a list of distributions and their abbreviations. The default value is distribution="norm" (Normal distribution).

When test="sw", test="sf", or test="ppcc", any continuous distribuiton is allowed (e.g., "norm" (normal), "lnorm" (lognormal), "gamma" (gamma), etc.), as well as mixed distributions involving the normal distribution (i.e., "zmnorm" (zero-modified normal), "zmlnorm" (zero-modified lognormal (delta)), and
"zmlnorm.alt" (zero-modified lognormal with alternative parameterization)).

When test="skew", only the values "norm" (normal), "lnorm" (lognormal), "lnorm.alt" (lognormal with alternative parameterization), "zmnorm" (zero-modified normal), "zmlnorm" (zero-modified lognormal (delta)), and
"zmlnorm.alt" (zero-modified lognormal with alternative parameterization) are allowed.

When test="ks", any continuous distribution is allowed.

When test="chisq", any distribuiton is allowed.

When test="ws", this argument is ignored.

est.arg.list

a list of arguments to be passed to the function estimating the distribution parameters. For example, if test="sw" and distribution="gamma", setting
est.arg.list=list(method="bcmle") indicates using the bias-corrected
maximum-likelihood estimators of shape and scale (see the help file for egamma). See the help file Estimating Distribution Parameters for a list of estimating functions. The default value is est.arg.list=NULL so that all default values for the estimating function are used. This argument is ignored if
estimate.params=FALSE.

When test="sw", test="sf", test="ppcc", or test="skew", and you are testing for some form of normality (i.e., Normal, Lognormal, Three-Parameter Lognormal, Zero-Modified Normal, or Zero-Modified Lognormal (Delta)), the estimated parameters are provided in the output merely for information, and the choice of the method of estimation has no effect on the goodness-of-fit test statistic or p-value.

When test="ks", x is not supplied, and estimate.params=TRUE, the estimated parameters are used to specify the null hypothesis of which distribution the data are assumed to come from.

When test="chisq" and estimate.params=TRUE, the estimated parameters are used to specify the null hypothesis of which distribution the data are assumed to come from.

When test="ws", this argument is ignored.

alternative

for the case when test="ks", test="skew", or test="ws", character string specifying the alternative hypothesis. When test="ks" or test="skew", the possible values are "two-sided" (the default), "greater", or "less". When test="ws", the possible values are "greater" (the default), or "less". See the DETAILS section of the help file for ks.test for more explanation of the meaning of this argument.

n.classes

for the case when test="chisq", the number of cells into which the observations are to be allocated. If the argument cut.points is supplied, then n.classes is set to length(cut.points)-1. The default value is
ceiling(2* (length(x)^(2/5))) and is recommended by Moore (1986).

cut.points

for the case when test="chisq", a vector of cutpoints that defines the cells. The element x[i] is allocated to cell j if
cut.points[j] < x[i] cut.points[j+1]. If x[i] is less than or equal to the first cutpoint or greater than the last cutpoint, then x[i] is treated as missing. If the hypothesized distribution is discrete, cut.points must be supplied. The default value is cut.points=NULL, in which case the cutpoints are determined by n.classes equi-probable intervals.

param.list

for the case when test="ks" and x is not supplied, or when test="chisq", a list with values for the parameters of the specified distribution. See the help file for Distribution.df for the names and possible values of the parameters associated with each distribution. The default value is param.list=NULL, which forces estimation of the distribution parameters. This argument is ignored if estimate.params=TRUE.

estimate.params

for the case when test="ks" and x is not supplied, or when test="chisq", a logical scalar indicating whether to perform the goodness-of-fit test based on estimating the distribution parameters (estimate.params=TRUE) or using the user-supplied distribution parameters specified by param.list
(estimate.params=FALSE). The default value of estimate.params is TRUE if param.list=NULL, otherwise it is FALSE.

n.param.est

for the case when test="ks" and x is not supplied, or when test="chisq", an integer indicating the number of parameters estimated from the data.
If estimate.params=TRUE, the default value is the number of parameters associated with the distribution specified by distribution (e.g., 2 for a normal distribution). If estimate.params=FALSE, the default value is n.param.est=0.

correct

for the case when test="chisq", a logical scalar indicating whether to use the continuity correction. The default value is correct=FALSE unless
n.classes=2.

digits

for the case when test="ks" and x is not supplied, or when test="chisq", and param.list is supplied, a scalar indicating how many significant digits to print out for the parameters associated with the hypothesized distribution. The default value is .Options$digits.

exact

for the case when test="ks", exact=NULL by default, but can be set to a logical scalar indicating whether an exact p-value should be computed. See the help file for ks.test for more information.

ws.method

for the case when test="ws", this argument specifies whether to perform the test based on normal scores (ws.method="normal scores", the default) or chi-square scores (ws.method="chi-square scores"). See the DETAILS section for more information.

warn

logical scalar indicating whether to print a warning message when observations with NAs, NaNs, or Infs in y or x are removed. The default value is TRUE.

data.name

character string indicating the name of the data used for argument y.

data.name.x

character string indicating the name of the data used for argument x.

parent.of.data

character string indicating the source of the data used for the goodness-of-fit test.

subset.expression

character string indicating the expression used to subset the data.

...

additional arguments affecting the goodness-of-fit test.

Details

Value

a list of class "gof" containing the results of the goodness-of-fit test, unless the two-sample
Kolmogorov-Smirnov test is used, in which case the value is a list of class "gofTwoSample". Objects of class "gof" and "gofTwoSample" have special printing and plotting methods. See the help files for gof.object and gofTwoSample.object for details.

Note

The Shapiro-Wilk test (Shapiro and Wilk, 1965) and the Shapiro-Francia test (Shapiro and Francia, 1972) are probably the two most commonly used hypothesis tests to test departures from normality. The Shapiro-Wilk test is most powerful at detecting short-tailed (platykurtic) and skewed distributions, and least powerful against symmetric, moderately long-tailed (leptokurtic) distributions. Conversely, the Shapiro-Francia test is more powerful against symmetric long-tailed distributions and less powerful against short-tailed distributions (Royston, 1992b; 1993).

The zero-skew goodness-of-fit test for normality is one of several tests that have been proposed to test the assumption of a normal distribution (D'Agostino, 1986b). This test has been included mainly because it is called by elnorm3. Ususally, the Shapiro-Wilk or Shapiro-Francia test is preferred to this test, unless the direction of the alternative to normality (e.g., positive skew) is known (D'Agostino, 1986b, pp. 405–406).

Kolmogorov (1933) introduced a goodness-of-fit test to test the hypothesis that a random sample of n observations x comes from a specific hypothesized distribution with cumulative distribution function H. This test is now usually called the one-sample Kolmogorov-Smirnov goodness-of-fit test. Smirnov (1939) introduced a goodness-of-fit test to test the hypothesis that a random sample of n observations x comes from the same distribution as a random sample of m observations y. This test is now usually called the two-sample Kolmogorov-Smirnov goodness-of-fit test. Both tests are based on the maximum vertical distance between two cumulative distribution functions. For the one-sample problem with a small sample size, the Kolmogorov-Smirnov test may be preferred over the chi-squared goodness-of-fit test since the KS-test is exact, while the chi-squared test is based on an asymptotic approximation.

The chi-squared test, introduced by Pearson in 1900, is the oldest and best known goodness-of-fit test. The idea is to reduce the goodness-of-fit problem to a multinomial setting by comparing the observed cell counts with their expected values under the null hypothesis. Grouping the data sacrifices information, especially if the hypothesized distribution is continuous. On the other hand, chi-squared tests can be be applied to any type of variable: continuous, discrete, or a combination of these.

The Wilk-Shapiro (1968) tests for a Uniform [0, 1] distribution were introduced in the context of testing whether several independent samples all come from normal distributions, with possibly different means and variances. The function gofGroupTest extends this idea to allow you to test whether several independent samples come from the same distribution (e.g., gamma, extreme value, etc.), with possibly different parameters.

In practice, almost any goodness-of-fit test will not reject the null hypothesis if the number of observations is relatively small. Conversely, almost any goodness-of-fit test will reject the null hypothesis if the number of observations is very large, since “real” data are never distributed according to any theoretical distribution (Conover, 1980, p.367). For most cases, however, the distribution of “real” data is close enough to some theoretical distribution that fairly accurate results may be provided by assuming that particular theoretical distribution. One way to asses the goodness of the fit is to use goodness-of-fit tests. Another way is to look at quantile-quantile (Q-Q) plots (see qqPlot).

Author(s)

Steven P. Millard ([email protected])

References

Birnbaum, Z.W., and F.H. Tingey. (1951). One-Sided Confidence Contours for Probability Distribution Functions. Annals of Mathematical Statistics 22, 592-596.

Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. John Wiley and Sons, New York.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.

Dallal, G.E., and L. Wilkinson. (1986). An Analytic Approximation to the Distribution of Lilliefor's Test for Normality. The American Statistician 40, 294-296.

D'Agostino, R.B. (1970). Transformation to Normality of the Null Distribution of g1. Biometrika 57, 679-681.

D'Agostino, R.B. (1971). An Omnibus Test of Normality for Moderate and Large Size Samples. Biometrika 58, 341-348.

D'Agostino, R.B. (1986b). Tests for the Normal Distribution. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York.

D'Agostino, R.B., and E.S. Pearson (1973). Tests for Departures from Normality. Empirical Results for the Distributions of b2 and √{b1}. Biometrika 60(3), 613-622.

D'Agostino, R.B., and G.L. Tietjen (1973). Approaches to the Null Distribution of √{b1}. Biometrika 60(1), 169-173.

Fisher, R.A. (1950). Statistical Methods for Research Workers. 11'th Edition. Hafner Publishing Company, New York, pp.99-100.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Kendall, M.G., and A. Stuart. (1991). The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Fifth Edition. Oxford University Press, New York.

Kim, P.J., and R.I. Jennrich. (1973). Tables of the Exact Sampling Distribution of the Two Sample Kolmogorov-Smirnov Criterion. In Harter, H.L., and D.B. Owen, eds. Selected Tables in Mathematical Statistics, Vol. 1. American Mathematical Society, Providence, Rhode Island, pp.79-170.

Kolmogorov, A.N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell' Istituto Italiano degle Attuari 4, 83-91.

Marsaglia, G., W.W. Tsang, and J. Wang. (2003). Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8(18). http://www.jstatsoft.org/v08/i18/.

Moore, D.S. (1986). Tests of Chi-Squared Type. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, pp.63-95.

Pomeranz, J. (1973). Exact Cumulative Distribution of the Kolmogorov-Smirnov Statistic for Small Samples (Algorithm 487). Collected Algorithms from ACM ??, ???-???.

Royston, J.P. (1992a). Approximating the Shapiro-Wilk W-Test for Non-Normality. Statistics and Computing 2, 117-119.

Royston, J.P. (1992b). Estimation, Reference Ranges and Goodness of Fit for the Three-Parameter Log-Normal Distribution. Statistics in Medicine 11, 897-912.

Royston, J.P. (1992c). A Pocket-Calculator Algorithm for the Shapiro-Francia Test of Non-Normality: An Application to Medicine. Statistics in Medicine 12, 181-184.

Royston, P. (1993). A Toolkit for Testing for Non-Normality in Complete and Censored Samples. The Statistician 42, 37-43.

Ryan, T., and B. Joiner. (1973). Normal Probability Plots and Tests for Normality. Technical Report, Pennsylvannia State University, Department of Statistics.

Shapiro, S.S., and R.S. Francia. (1972). An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association 67(337), 215-219.

Shapiro, S.S., and M.B. Wilk. (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika 52, 591-611.

Smirnov, N.V. (1939). Estimate of Deviation Between Empirical Distribution Functions in Two Independent Samples. Bulletin Moscow University 2(2), 3-16.

Smirnov, N.V. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19, 279-281.

Stephens, M.A. (1970). Use of the Kolmogorov-Smirnov, Cramer-von Mises and Related Statistics Without Extensive Tables. Journal of the Royal Statistical Society, Series B, 32, 115-122.

Stephens, M.A. (1986a). Tests Based on EDF Statistics. In D'Agostino, R. B., and M.A. Stevens, eds. Goodness-of-Fit Techniques. Marcel Dekker, New York.

Verrill, S., and R.A. Johnson. (1987). The Asymptotic Equivalence of Some Modified Shapiro-Wilk Statistics – Complete and Censored Sample Cases. The Annals of Statistics 15(1), 413-419.

Verrill, S., and R.A. Johnson. (1988). Tables and Large-Sample Distribution Theory for Censored-Data Correlation Statistics for Testing Normality. Journal of the American Statistical Association 83, 1192-1197.

Weisberg, S., and C. Bingham. (1975). An Approximate Analysis of Variance Test for Non-Normality Suitable for Machine Calculation. Technometrics 17, 133-134.

Wilk, M.B., and S.S. Shapiro. (1968). The Joint Assessment of Normality of Several Independent Samples. Technometrics, 10(4), 825-839.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

See Also

rosnerTest, gof.object, print.gof, plot.gof, shapiro.test, ks.test, chisq.test, Normal, Lognormal, Lognormal3, Zero-Modified Normal, Zero-Modified Lognormal (Delta), enorm, elnorm, elnormAlt, elnorm3, ezmnorm, ezmlnorm, ezmlnormAlt, qqPlot.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
  # Generate 20 observations from a gamma distribution with 
  # parameters shape = 2 and scale = 3 then run various 
  # goodness-of-fit tests.
  # (Note:  the call to set.seed lets you reproduce this example.)

  set.seed(47)
  dat <- rgamma(20, shape = 2, scale = 3)

  # Shapiro-Wilk generalized goodness-of-fit test
  #----------------------------------------------
  gof.list <- gofTest(dat, distribution = "gamma")
  gof.list

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF Based on 
  #                                 Chen & Balakrisnan (1995)
  #
  #Hypothesized Distribution:       Gamma
  #
  #Estimated Parameter(s):          shape = 1.909462
  #                                 scale = 4.056819
  #
  #Estimation Method:               mle
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  W = 0.9834958
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         0.970903
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Gamma Distribution.

  dev.new()
  plot(gof.list)

  #----------

  # Redo the example above, but use the bias-corrected mle

  gofTest(dat, distribution = "gamma", 
    est.arg.list = list(method = "bcmle"))

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF Based on 
  #                                 Chen & Balakrisnan (1995)
  #
  #Hypothesized Distribution:       Gamma
  #
  #Estimated Parameter(s):          shape = 1.656376
  #                                 scale = 4.676680
  #
  #Estimation Method:               bcmle
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  W = 0.9834346
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         0.9704046
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Gamma Distribution.

  #----------
  
  # Komogorov-Smirnov goodness-of-fit test (pre-specified parameters)
  #------------------------------------------------------------------

  gofTest(dat, test = "ks", distribution = "gamma", 
    param.list = list(shape = 2, scale = 3))

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Kolmogorov-Smirnov GOF
  #
  #Hypothesized Distribution:       Gamma(shape = 2, scale = 3)
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  ks = 0.2313878
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         0.2005083
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Gamma(shape = 2, scale = 3)
  #                                 Distribution.

  #----------

  # Chi-squared goodness-of-fit test (estimated parameters)
  #--------------------------------------------------------

  gofTest(dat, test = "chisq", distribution = "gamma", n.classes = 4)

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Chi-square GOF
  #
  #Hypothesized Distribution:       Gamma
  #
  #Estimated Parameter(s):          shape = 1.909462
  #                                 scale = 4.056819
  #
  #Estimation Method:               mle
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  Chi-square = 1.2
  #
  #Test Statistic Parameter:        df = 1
  #
  #P-value:                         0.2733217
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Gamma Distribution.

  #----------
  # Clean up

  rm(dat, gof.list)
  graphics.off()

  #--------------------------------------------------------------------

  # Example 10-2 of USEPA (2009, page 10-14) gives an example of 
  # using the Shapiro-Wilk test to test the assumption of normality 
  # for nickel concentrations (ppb) in groundwater collected over 
  # 4 years.  The data for this example are stored in 
  # EPA.09.Ex.10.1.nickel.df.

  EPA.09.Ex.10.1.nickel.df
  #   Month   Well Nickel.ppb
  #1      1 Well.1       58.8
  #2      3 Well.1        1.0
  #3      6 Well.1      262.0
  #4      8 Well.1       56.0
  #5     10 Well.1        8.7
  #6      1 Well.2       19.0
  #7      3 Well.2       81.5
  #8      6 Well.2      331.0
  #9      8 Well.2       14.0
  #10    10 Well.2       64.4
  #11     1 Well.3       39.0
  #12     3 Well.3      151.0
  #13     6 Well.3       27.0
  #14     8 Well.3       21.4
  #15    10 Well.3      578.0
  #16     1 Well.4        3.1
  #17     3 Well.4      942.0
  #18     6 Well.4       85.6
  #19     8 Well.4       10.0
  #20    10 Well.4      637.0

  # Test for a normal distribution:
  #--------------------------------

  gof.list <- gofTest(Nickel.ppb ~ 1, 
    data = EPA.09.Ex.10.1.nickel.df)
  gof.list

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Normal
  #
  #Estimated Parameter(s):          mean = 169.5250
  #                                 sd   = 259.7175
  #
  #Estimation Method:               mvue
  #
  #Data:                            Nickel.ppb
  #
  #Data Source:                     EPA.09.Ex.10.1.nickel.df
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  W = 0.6788888
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         2.17927e-05
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Normal Distribution.

  dev.new()
  plot(gof.list)

  #----------

  # Test for a lognormal distribution:
  #-----------------------------------

  gofTest(Nickel.ppb ~ 1, 
    data = EPA.09.Ex.10.1.nickel.df, 
    dist = "lnorm")

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Lognormal
  #
  #Estimated Parameter(s):          meanlog = 3.918529
  #                                 sdlog   = 1.801404
  #
  #Estimation Method:               mvue
  #
  #Data:                            Nickel.ppb
  #
  #Data Source:                     EPA.09.Ex.10.1.nickel.df
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  W = 0.978946
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         0.9197735
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Lognormal Distribution.

  #----------

  # Test for a lognormal distribution, but use the 
  # Mean and CV parameterization:
  #-----------------------------------------------

  gofTest(Nickel.ppb ~ 1, 
    data = EPA.09.Ex.10.1.nickel.df, 
    dist = "lnormAlt")

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Lognormal
  #
  #Estimated Parameter(s):          mean = 213.415628
  #                                 cv   =   2.809377
  #
  #Estimation Method:               mvue
  #
  #Data:                            Nickel.ppb
  #
  #Data Source:                     EPA.09.Ex.10.1.nickel.df
  #
  #Sample Size:                     20
  #
  #Test Statistic:                  W = 0.978946
  #
  #Test Statistic Parameter:        n = 20
  #
  #P-value:                         0.9197735
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Lognormal Distribution.

  #----------
  # Clean up

  rm(gof.list)
  graphics.off()

  #---------------------------------------------------------------------------

  # Generate 20 observations from a normal distribution with mean=3 and sd=2, and 
  # generate 10 observaions from a normal distribution with mean=2 and sd=2 then 
  # test whether these sets of observations come from the same distribution.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(300) 
  dat1 <- rnorm(20, mean = 3, sd = 2) 
  dat2 <- rnorm(10, mean = 1, sd = 2) 
  gofTest(x = dat1, y = dat2, test = "ks")

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     2-Sample K-S GOF
  #
  #Hypothesized Distribution:       Equal
  #
  #Data:                            x = dat1
  #                                 y = dat2
  #
  #Sample Sizes:                    n.x = 20
  #                                 n.y = 10
  #
  #Test Statistic:                  ks = 0.7
  #
  #Test Statistic Parameters:       n = 20
  #                                 m = 10
  #
  #P-value:                         0.001669561
  #
  #Alternative Hypothesis:          The cdf of 'dat1' does not equal
  #                                 the cdf of 'dat2'.

  #----------
  # Clean up

  rm(dat1, dat2)

EnvStats documentation built on May 30, 2017, 2:24 a.m.