run_parallel: Extract many random matrix eigenvalues.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/nfactors.R

Description

Extract eigenvalues from randomly generated data, parametrized as original data.

Usage

1
2
3
4
5
6
7
8
9
run_parallel(
  dataset = NULL,
  centile = 0.95,
  runs = 5000,
  cutoff = NULL,
  grid = NULL,
  n = NULL,
  p = NULL
)

Arguments

dataset

An integer matrix with item-cases as rows and people-variables as columns. Defaults to NULL, in which case parameters must be passed to draw_rand_sort() using ....

centile

A positive numerical vector of length 1, as the percentile of random eigenvalues to return. Defaults to the conventional significance threshold of .95.

runs

A positive integer vector of length 1, as the number of random data to draw, defaults to 5000. Lower number will reduce computational cost, but results may be less reliable.

cutoff

A positive integer vector of length 1, as the maximum number of Principal Components to extract from random data. Defaults to NULL, in which case the minimum of n and p are used, because there can be no more Principal Components than there are item-cases or people-variables in a dataset. Manually specifying a lower number may reduce the computational cost.

grid

A positive integer vector of a length covering the range of values, specifying maximum allowed frequencies for each value. (in Q-parlance, the maximum column heights for the Q-sorts).

n

A positive integer vector of length 1, as the number of people-variables. Defaults to NULL, in which case parameter is inferred from data.

p

A positive integer vector of length 1, as the number of item-cases. Defaults to NULL, in which case parameter is inferred from data.

Details

Implements Horn's (1965) parallel analysis as a guide to inform factor retention, with appropriate parametrization for Q-sort data (via draw_rand_sort()).

Random data can include spurious correlations and spurious factors by mere chance. This problem might affect Q studies, too: people might produce similar Q sorts not because they share viewpoints, but just out of random chance. Parallel analysis tests this assertion by extracting (spurious) principal components from many sets of random data, similarly parametrized as the provided data.

The analysis returns the random eigenvalues at the specified centile over all of the specified random runs. This result can be interpreted as a necessary Eigenvalue threshold for some observed principal component to be considered non-spurious, with the specified centile as the confidence (or significance level).

In summary, parallel analysis suggests the number of factors in the data that are unlikely to be products of random chance at some significance level centile.

For more details, consider the related paran package.

Value

A numerical vector of length cutoff, with Eigenvalues for consecutive Principal Components.

Note

This function is currently based on principal components analysis (PCA) as a “factor” extraction technique. Thompson (2004: 30ff) and others seem to suggest that such PCA-based criteria can be used as rough indications for how many factors to extract with other exploratory techniques. However, some of the results presented here are meaningful only in a PCA-context, and dependent functions are sometimes called with PCA-related options.

Author(s)

Maximilian Held

References

See Also

Other parallel-analysis: draw_rand_sort()

Examples

1
2
3
4
5
dataset <- civicon_2014$qData$sorts[,,"before"]
run_parallel(dataset = dataset,  # parameters are inferred
             runs = 10, # way too few, just to make this fast
             centile = .95)  # default
# results are all the same, because study used forced Q distribution

maxheld83/pensieveR documentation built on Jan. 21, 2020, 9:15 a.m.