get_hotellings: Hotelling's statistics (for two independent (small) samples)

Description Usage Arguments Details Value References See Also Examples

View source: R/statistics.R

Description

The function get_hotellings() estimates the parameters for Hotelling's two-sample T^2 statistic for small samples.

Usage

1

Arguments

m1

A matrix with the data of the reference group.

m2

A matrix with the same dimensions as matrix m1, with the data of the test group.

signif

A positive numeric value between 0 and 1 specifying the significance level. The default value is 0.05.

Details

The two-sample Hotelling's T^2 test statistic is given by

T^2 = (x.bar_1 - x.bar_2)^{\top} (S_p (1 / n_1 + 1 / n_2))^{-1} (x.bar_1 - x.bar_2) .

For large samples, this test statistic will be approximately chi-square distributed with p degrees of freedom. However, this approximation does not take into account the variation due to the variance-covariance matrix estimation. Therefore, Hotelling's T^2 statistic is transformed into an F-statistic using the expression

F = (n_1 + n_2 - p - 1) / ((n_1 + n_2 - 2) p) T^2 ,

where n_1 and n_2 are the sample sizes of the two samples being compared and p is the number of variables.

Under the null hypothesis, H_0: μ_1 = μ_2, this F-statistic will be F-distributed with p and n_1 + n_2 - p degrees of freedom. H_0 is rejected at significance level α if the F-value exceeds the critical value from the F-table evaluated at α, i.e. F > F_{p, n_1 + n_2 - p - 1, α}. The null hypothesis is satisfied if, and only if, the population means are identical for all variables. The alternative is that at least one pair of these means is different.

The following assumptions concerning the data are made:

Value

A list with the following elements is returned:

Parameters

Parameters determined for the estimation of Hotelling's T^2.

S.pool

Pooled variance-covariance matrix.

covs

A list with the elements S.b1 and S.b2, i.e. the variance-covariance matrices of the reference and the test group, respectively.

means

A list with the elements mean.b1, mean.b2 and mean.diff, i.e. the average profile values (for each time point) of the reference and the test group and the corresponding differences of the averages, respectively.

The Parameters element contains the following information:

DM

Mahalanobis distance of the samples.

df1

Degrees of freedom (number of variables or time points).

df2

Degrees of freedom (number of rows - number of variables - 1).

alpha

Provided significance level.

K

Scaling factor for F to account for the distribution of the T^2 statistic.

k

Scaling factor for the squared Mahalanobis distance to obtain the T^2 statistic.

T2

Hotelling's T^2 statistic (F-distributed).

F

Observed F value.

F.crit

Critical F value.

p.F

p value for Hotelling's T^2 test statistic.

References

Hotelling, H. The generalisation of Student's ratio. Ann Math Stat. 1931; 2(3): 360-378.

Hotelling, H. (1947) Multivariate quality control illustrated by air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., and Wallis, W.A., Eds., Techniques of Statistical Analysis, McGraw Hill, New York, 111-184.

See Also

mimcr, get_sim_lim.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Dissolution data of one reference batch and one test batch of n = 6
# tablets each:
str(dip1)

# 'data.frame':	12 obs. of  10 variables:
# $ type  : Factor w/ 2 levels "R","T": 1 1 1 1 1 1 2 2 2 2 ...
# $ tablet: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
# $ t.5   : num  42.1 44.2 45.6 48.5 50.5 ...
# $ t.10  : num  59.9 60.2 55.8 60.4 61.8 ...
# $ t.15  : num  65.6 67.2 65.6 66.5 69.1 ...
# $ t.20  : num  71.8 70.8 70.5 73.1 72.8 ...
# $ t.30  : num  77.8 76.1 76.9 78.5 79 ...
# $ t.60  : num  85.7 83.3 83.9 85 86.9 ...
# $ t.90  : num  93.1 88 86.8 88 89.7 ...
# $ t.120 : num  94.2 89.6 90.1 93.4 90.8 ...

# Estimation of the parameters for Hotelling's two-sample T2 statistic
# (for small samples)
res <-
  get_hotellings(m1 = as.matrix(dip1[dip1$type == "R", c("t.15", "t.90")]),
                 m2 = as.matrix(dip1[dip1$type == "T", c("t.15", "t.90")]),
                 signif = 0.1)
res$S.pool
res$Parameters

# Expected results in res$S.pool
#          t.15     t.90
# t.15 3.395808 1.029870
# t.90 1.029870 4.434833

# Expected results in res$Parameters
#           DM          df1          df2       signif            K
# 1.044045e+01 2.000000e+00 9.000000e+00 1.000000e-01 1.350000e+00
#            k           T2            F       F.crit          p.F
# 3.000000e+00 3.270089e+02 1.471540e+02 3.006452e+00 1.335407e-07

disprofas documentation built on Dec. 8, 2021, 5:10 p.m.