PermuteLDA: A function to determine whether two groups are in statistically different locations in multivariate space See Collyer and Adams 2007

Description

The function calculates the multivariate distance between two groups across all traits and determines whether they differ signifcantly using a Monte Carlo randomization test. The Monte Carlo randomization creates a null distribution by randomizing the residual deviation from the group mean across all individuals. This method controls for heteroscedasticity and was designed by Collyer and Adams (2007) for use in analyzing data sets that have sparse groups sizes relative to the number of traits.

Usage

1
PermuteLDA(Data, Groups, NPerm, Missing.Data = "Complete")

Arguments

Data

A (non-empty), numeric matrix of data values

Groups

A (non-empty), vector indicating group membership.

NPerm

The number of permutations used to generate the null distribution. The default is 100.

Missing.Data

The method used to handle missing data. The default, 'Complete' will use CompleteData to impute missing data, setting Missing.Data='Remove' will remove all individuals with missing data. FSelect cannot handle missing data.

Details

Determining the statistical significance of a discriminate function analysis along with performing that analysis on sparse data sets, e.g. many traits observed on comparatively few individuals, is a challenge. Collyer and Adams (2007) developed a Monte Carlo based algorithm for addressing both of those issues. Briefly, the test uses the underlying Var/Cov structure of the data and randomizes the group membership to calculate a null distribution. This test simultaneously controls for heteroscedasticity, a common problem in sparse data sets and allows the approximation of a p-value for the test. For the original implementation and formulation of the method see Collyer and Adams (2007) or http://www.public.iastate. edu/~dcadams/software.html. Unlike the FSelect implementation, PermuteLDA will work properly with an arbitrary number of groups. The time required to run the algorithm is non-linear in the number of groups.

Value

Returns a data frame with four columns and the number of groups choose 2 rows. Each row is a pairwise comparison between groups. The column 'Pr' is the p value to reject the null hypothesis of no difference (a value in 'Pr' < 0.05 would result in rejecting the hypothesis that the two groups are not different. The column 'Distance' is the multivariate distance between the two groups.

References

Collyer M, Adams D (2007). Analysis of Two - State Multivariate Phenotypic Change in Ecological Studies. Ecology, 88(3), 683 - 692.

For an implementation of the original method coded in R see http://www.public.iastate. edu/~dcadams/software.html.

See Also

PermuteLDA

Examples

1
2
3

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.