get_hotellings: Hotelling's statistics (for two independent (small) samples)
In disprofas: Non-Parametric Dissolution Profile Analysis

Description Usage Arguments Details Value References See Also Examples

The function get_hotellings() estimates the parameters for Hotelling's two-sample T^2 statistic for small samples.

1	get_hotellings(m1, m2, signif)

`m1`	A matrix with the data of the reference group.
`m2`	A matrix with the same dimensions as matrix `m1`, with the data of the test group.
`signif`	A positive numeric value between `0` and `1` specifying the significance level. The default value is `0.05`.

The two-sample Hotelling's T^2 test statistic is given by

T^2 = (x.bar_1 - x.bar_2)^{\top} (S_p (1 / n_1 + 1 / n_2))^{-1} (x.bar_1 - x.bar_2) .

For large samples, this test statistic will be approximately chi-square distributed with p degrees of freedom. However, this approximation does not take into account the variation due to the variance-covariance matrix estimation. Therefore, Hotelling's T^2 statistic is transformed into an F-statistic using the expression

F = (n_1 + n_2 - p - 1) / ((n_1 + n_2 - 2) p) T^2 ,

where n_1 and n_2 are the sample sizes of the two samples being compared and p is the number of variables.

Under the null hypothesis, H_0: μ_1 = μ_2, this F-statistic will be F-distributed with p and n_1 + n_2 - p degrees of freedom. H_0 is rejected at significance level α if the F-value exceeds the critical value from the F-table evaluated at α, i.e. F > F_{p, n_1 + n_2 - p - 1, α}. The null hypothesis is satisfied if, and only if, the population means are identical for all variables. The alternative is that at least one pair of these means is different.

The following assumptions concerning the data are made:

The data from population i is a sample from a population with mean vector μ_i. In other words, it is assumed that there are no sub-populations.
The data from both populations have common variance-covariance matrix Σ.
The subjects from both populations are independently sampled.
Both populations are normally distributed.

A list with the following elements is returned:

`Parameters`	Parameters determined for the estimation of Hotelling's T^2.
`S.pool`	Pooled variance-covariance matrix.
`covs`	A list with the elements `S.b1` and `S.b2`, i.e. the variance-covariance matrices of the reference and the test group, respectively.
`means`	A list with the elements `mean.b1`, `mean.b2` and `mean.diff`, i.e. the average profile values (for each time point) of the reference and the test group and the corresponding differences of the averages, respectively.

The Parameters element contains the following information:

`DM`	Mahalanobis distance of the samples.
`df1`	Degrees of freedom (number of variables or time points).
`df2`	Degrees of freedom (number of rows - number of variables - 1).
`alpha`	Provided significance level.
`K`	Scaling factor for F to account for the distribution of the T^2 statistic.
`k`	Scaling factor for the squared Mahalanobis distance to obtain the T^2 statistic.
`T2`	Hotelling's T^2 statistic (F-distributed).
`F`	Observed F value.
`F.crit`	Critical F value.
`p.F`	p value for Hotelling's T^2 test statistic.

Hotelling, H. The generalisation of Student's ratio. Ann Math Stat. 1931; 2(3): 360-378.

Hotelling, H. (1947) Multivariate quality control illustrated by air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., and Wallis, W.A., Eds., Techniques of Statistical Analysis, McGraw Hill, New York, 111-184.

mimcr, get_sim_lim.

# Dissolution data of one reference batch and one test batch of n = 6
# tablets each:
str(dip1)

# 'data.frame':	12 obs. of  10 variables:
# $ type  : Factor w/ 2 levels "R","T": 1 1 1 1 1 1 2 2 2 2 ...
# $ tablet: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
# $ t.5   : num  42.1 44.2 45.6 48.5 50.5 ...
# $ t.10  : num  59.9 60.2 55.8 60.4 61.8 ...
# $ t.15  : num  65.6 67.2 65.6 66.5 69.1 ...
# $ t.20  : num  71.8 70.8 70.5 73.1 72.8 ...
# $ t.30  : num  77.8 76.1 76.9 78.5 79 ...
# $ t.60  : num  85.7 83.3 83.9 85 86.9 ...
# $ t.90  : num  93.1 88 86.8 88 89.7 ...
# $ t.120 : num  94.2 89.6 90.1 93.4 90.8 ...

# Estimation of the parameters for Hotelling's two-sample T2 statistic
# (for small samples)
res <-
  get_hotellings(m1 = as.matrix(dip1[dip1$type == "R", c("t.15", "t.90")]),
                 m2 = as.matrix(dip1[dip1$type == "T", c("t.15", "t.90")]),
                 signif = 0.1)
res$S.pool
res$Parameters

# Expected results in res$S.pool
#          t.15     t.90
# t.15 3.395808 1.029870
# t.90 1.029870 4.434833

# Expected results in res$Parameters
#           DM          df1          df2       signif            K
# 1.044045e+01 2.000000e+00 9.000000e+00 1.000000e-01 1.350000e+00
#            k           T2            F       F.crit          p.F
# 3.000000e+00 3.270089e+02 1.471540e+02 3.006452e+00 1.335407e-07