run_parallel: Extract many random matrix eigenvalues.
In maxheld83/pensieve: Tools for the Scientific Study of Operant Subjectivity

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Extract eigenvalues from randomly generated data, parametrized as original data.

run_parallel(
  dataset = NULL,
  centile = 0.95,
  runs = 5000,
  cutoff = NULL,
  grid = NULL,
  n = NULL,
  p = NULL
)

`dataset`	An integer matrix with item-cases as rows and people-variables as columns. Defaults to `NULL`, in which case parameters must be passed to `draw_rand_sort()` using `...`.
`centile`	A positive numerical vector of length 1, as the percentile of random eigenvalues to return. Defaults to the conventional significance threshold of `.95`.
`runs`	A positive integer vector of length 1, as the number of random data to draw, defaults to `5000`. Lower number will reduce computational cost, but results may be less reliable.
`cutoff`	A positive integer vector of length 1, as the maximum number of Principal Components to extract from random data. Defaults to `NULL`, in which case the minimum of `n` and `p` are used, because there can be no more Principal Components than there are item-cases `or` people-variables in a dataset. Manually specifying a lower number may reduce the computational cost.
`grid`	A positive integer vector of a length covering the range of values, specifying maximum allowed frequencies for each value. (in Q-parlance, the maximum column heights for the Q-sorts).
`n`	A positive integer vector of length 1, as the number of people-variables. Defaults to `NULL`, in which case parameter is inferred from `data`.
`p`	A positive integer vector of length 1, as the number of item-cases. Defaults to `NULL`, in which case parameter is inferred from `data`.

Implements Horn's (1965) parallel analysis as a guide to inform factor retention, with appropriate parametrization for Q-sort data (via draw_rand_sort()).

Random data can include spurious correlations and spurious factors by mere chance. This problem might affect Q studies, too: people might produce similar Q sorts not because they share viewpoints, but just out of random chance. Parallel analysis tests this assertion by extracting (spurious) principal components from many sets of random data, similarly parametrized as the provided data.

The analysis returns the random eigenvalues at the specified centile over all of the specified random runs. This result can be interpreted as a necessary Eigenvalue threshold for some observed principal component to be considered non-spurious, with the specified centile as the confidence (or significance level).

In summary, parallel analysis suggests the number of factors in the data that are unlikely to be products of random chance at some significance level centile.

For more details, consider the related paran package.

A numerical vector of length cutoff, with Eigenvalues for consecutive Principal Components.

This function is currently based on principal components analysis (PCA) as a “factor” extraction technique. Thompson (2004: 30ff) and others seem to suggest that such PCA-based criteria can be used as rough indications for how many factors to extract with other exploratory techniques. However, some of the results presented here are meaningful only in a PCA-context, and dependent functions are sometimes called with PCA-related options.

Maximilian Held

Glorfeld, L. W. 1995: An Improvement on Horn-s Parallel Analysis Methodology for Selecting the Correct Number of Factors to Retain, Educational and Psychological Measurement. 55(3): 377-393.
Horn, J. L. (1965): A rationale and a test for the number of factors in factor analysis, Psychometrika. 30: 179-185.

Other parallel-analysis: draw_rand_sort()

dataset <- civicon_2014$qData$sorts[,,"before"]
run_parallel(dataset = dataset,  # parameters are inferred
             runs = 10, # way too few, just to make this fast
             centile = .95)  # default
# results are all the same, because study used forced Q distribution