03Factanal: Estimate Factor Analysis Models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

This function is intended for users and estimates a factor analysis model that has been set up previously with a call to make_manifest and a call to make_restrictions.

Usage

1
2
3
Factanal(manifest, restrictions, scores = "none", seeds = 12345, 
lower = sqrt(.Machine$double.eps), analytic = TRUE, reject = TRUE, 
NelderMead = TRUE, impatient = FALSE, ...)

Arguments

manifest

An object that inherits from manifest-class and is typically produced by make_manifest.

restrictions

An object that inherits from restrictions-class and is typically produced by make_restrictions.

scores

Type of factor scores to produce, if any. The default is "none". Other valid choices (which can be partially matched) are "regression", "Bartlett", "Thurstone", "Ledermann", "Anderson-Rubin", "McDonald",
"Krinjen", "Takeuchi", and "Harman". See Beauducel (2007) for formulae for these factor scores as well as proofs that all but "regression" and "Harman" produce the same correlation matrix.

seeds

A vector of length one or two to be used as the random number generator seeds corresponding to the unif.seed and int.seed arguments to genoud respectively. If seeds is a single number, this seed is used for both unif.seed and int.seed. These seeds override the defaults for genoud and make it easier to replicate an analysis exactly. If NULL, the default arguments for unif.seed and int.seed as specified in genoud are used. NULL should be used in simulations or else they will be horribly wrong.

lower

A lower bound. In exploratory factor analysis, lower is the minimum uniqueness and corresponds to the 'lower' element of the list specified for control in factanal. Otherwise, lower is the lower bound used for singular values when checking for positive-definiteness and ranks of matrices. If the unlikely event that you get errors referencing positive definiteness, try increasing the value of lower slightly.

analytic

A logical (default to TRUE) indicating whether analytic gradients should be used as much as possible. If FALSE, then numeric gradients will be calculated, which are slower and slightly less accurate but are necessary in some situations and useful for debugging analytic gradients.

reject

Logical indicating whether to reject starting values that fail the constraints required by the model; see create_start

NelderMead

Logical indicating whether to call optim with method = "Nelder-Mead" when the genetic algorithm has finished to further polish the solution. This option is not relevant or necessary for exploratory factor analysis models.

impatient

Logical that defaults to FALSE. If restrictions is of restrictions.factanal-class, setting it to TRUE will cause factanal to be used for optimization instead of genoud. In all other situations, setting it to TRUE will use factanal to to generate initial communality estimates instead of the slower default mechanism.

...

Further arguments that are passed to genoud. The following arguments to genoud are hard-coded and cannot be changed because they are logically required by the factor analyis estimator:

argument value why?
nvars restrictions@nvars
max FALSE minimizing the objective
hessian FALSE we roll our own
lexical TRUE (usually) for restricted optimization
Domains restrictions@Domains
data.type.int FALSE parameters are doubles
fn wrapper around fitS4
BFGSfn wrapper around bfgs_fitS4
BFGShelp wrapper around bfgs_helpS4
gr various it is complicated
unif.seed taken from seeds replicability
int.seed taken from seeds replicability

The following arguments to genoud default to values that differ from those documented at genoud but can be overridden by specifying them explicitly in the ... :

argument value why?
boundary.enforcement 1 usually 2 can cause problems
MemoryMatrix FALSE runs faster
print.level 1 output is not that helpful for >= 2
P9mix 1 to always accept the BFGS result
BFGSburnin -1 to delay the gradient check
max.generations 1000 big number is often necessary
project.path contains "Factanal.txt"
starting.values see the Details section

Other arguments to genoud will take the documented default values unless explicitly specified. In particular, you may want to change wait.generations and solution.tolerance. Also, if informative bounds were placed on any of the parameters in the call to make_restrictions it is usually preferable to specify that boundary.enforcement = 2 to use constrained optimization in the internal calls to optim. However, the "L-BFGS-B" optimizer is less robust than the default "BFGS" optimizer and occasionally causes fatal errors, largly due to misfortune.

Details

The call to Factanal is somewhat of a formality in the sense that most of the difficult decisions were already made in the call to make_restrictions and the call to make_manifest. The most important remaining detail is the specification of the values for the starting population in the genetic algorithm.

It is not necessary to provide starting values, since there are methods for this purpose; see create_start. Also, if starting.values = NA, then a population of starting values will be created using the typical mechanism in genoud, namely random uniform draws from the domain of the parameter.

Otherwise, if reject = TRUE, starting values that fail one or more constraints are rejected and new vectors of starting values are generated until the population is filled with admissable starting values. In some cases, the constraints are quite difficult to satisfy by chance, and it may be more practical to specify reject = FALSE or to supply starting values explicitly. If starting values are supplied, it is helpful if at least one member of the genetic population satisfies all the constraints imposed on the model. Note the rownames of restrictions@Domains, which indicate the proper order of the free parameters.

A matrix (or vector) of starting values can be passed as starting.values. (Also, it is possible to pass an object of FA-class to starting.values, in which case the estimates from the previous call to Factanal are used as the starting values.) If a matrix, it should have columns equal to the number of rows in restrictions@Domains in the specified order and one or more rows up to the number of genetic individuals in the population.

If starting.values is a vector, its length can be equal to the number of rows in restrictions@Domains in which case it is treated as a one-row matrix, or its length can be equal to the number of manifest variables, in which case it is passed to the start argument of create_start as a vector of initial communality estimes, thus avoiding the sometimes time-consuming process of generating good initial communality estimates. This process can also be accelerated by specifying impatient = TRUE.

Value

An object of that inherits from FA-class.

Note

The underlying genetic algorithm can print a variety of output as it progresses. On Windows, you either have to move the scrollbar periodically to flush the output to the screen or disable buffering by either going to the Misc menu or by clicking Control+W. The output will, by default, look something like this

Generation First Second ... Last Discrepancy
number constraint constraint constraint function
0 -1.0 -1.0 ... -1.0 double
1 -1.0 -1.0 ... -1.0 double
... ... ... ... ... ...
42 -1.0 -1.0 ... -1.0 double

The integer on the far left indicates the generation number. If it appears to skip one or more generations, that signifies that the best individual in the “missing” generation was no better than the best individual in the previous generation. The sequence of -1.0 indicates that various constraints are being satisfied by the best individual in the generation. Some of these constraints are hard-coded, some are added by the choices the user makes in the call to make_restrictions. The curious are referred to the source code, but for the most part users need not worry about them provided they are -1.0. If any but the last are not -1.0 after the first few generations, there is a major problem because no individual is satisfying all the constraints. The last number is a double-precision number indicating the value of the discrepancy function. This number will decrease, sometimes painfully slowly, sometimes intermittently, over the generations since the discrepancy function is being minimized, subject to the aforementioned constraints.

Author(s)

Ben Goodrich

References

Barthlomew, D. J. and Knott, M. (1990) Latent Variable Analysis and Factor Analysis. Second Edition, Arnold.

Beauducel, A. (2007) In spite of indeterminancy, many common factor score estimates yield an identical reproduced covariance matrix. Psychometrika, 72, 437–441.

Smith, G. A. and Stanley G. (1983) Clocking g: relating intelligence and measures of timed performance. Intelligence, 7, 353–368.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

See Also

make_manifest, make_restrictions, and Rotate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
## Example from Venables and Ripley (2002, p. 323)
## Previously from Bartholomew and Knott  (1999, p. 68--72)
## Originally from Smith and Stanley (1983)
## Replicated from example(ability.cov)

man <- make_manifest(covmat = ability.cov)

## Not run: 
## Here is the easy way to set up a SEFA model, which uses pop-up menus
res <- make_restrictions(manifest = man, factors = 2, model = "SEFA")

## End(Not run)

## This is the hard way to set up a restrictions object without pop-up menus
beta <- matrix(NA_real_, nrow = nrow(cormat(man)), ncol = 2)
rownames(beta) <- rownames(cormat(man))
free <- is.na(beta)
beta <- new("parameter.coef.SEFA", x = beta, free = free, num_free = sum(free))

Phi  <- diag(2)
free <- lower.tri(Phi)
Phi  <- new("parameter.cormat", x = Phi, free = free, num_free = sum(free))
res  <- make_restrictions(manifest = man, beta = beta, Phi = Phi)

# This is how to make starting values where Phi is the correlation matrix 
# among factors, beta is the matrix of coefficients, and the scales are
# the logarithm of the sample standard deviations. It is also the MLE.
starts <- c( 4.46294498156615e-01, #  Phi_{21}
             4.67036349420035e-01, # beta_{11}
             6.42220238211291e-01, # beta_{21}
             8.88564379236454e-01, # beta_{31}
             4.77779639176941e-01, # beta_{41}
            -7.13405536379741e-02, # beta_{51}
            -9.47782525342137e-08, # beta_{61}
             4.04993872375487e-01, # beta_{12}
            -1.04604290549591e-08, # beta_{22}
            -9.44950629176182e-03, # beta_{32}
             2.63078925240678e-04, # beta_{42}
             9.38038168787216e-01, # beta_{52}
             8.43618801925473e-01, # beta_{62}
             log(man@sds))         # log manifest standard deviations

sefa <- Factanal(manifest = man, restrictions = res, 
                 # NOTE: Do NOT specify any of the following tiny values in a  
                 # real research situation; it is done here solely for speed
                 starting.values = starts, pop.size = 2, max.generations = 6,
                 wait.generations = 1)
nsim <- 101 # number of simulations, also too small for real work
show(sefa)
summary(sefa, nsim = nsim)
model_comparison(sefa, nsim = nsim)

stuff <- list() # output list for various methods
stuff$model.matrix <- model.matrix(sefa) # sample correlation matrix
stuff$fitted <- fitted(sefa, reduced = TRUE) # reduced covariance matrix
stuff$residuals <- residuals(sefa) # difference between model.matrix and fitted
stuff$rstandard <- rstandard(sefa) # normalized residual matrix
stuff$weights <- weights(sefa) # (scaled) approximate weights for residuals
stuff$influence <- influence(sefa) # weights * residuals
stuff$cormat <- cormat(sefa,  matrix = "RF") # reference factor correlations
stuff$uniquenesses <- uniquenesses(sefa, standardized = FALSE) # uniquenesses
stuff$FC <- loadings(sefa, matrix = "FC") # factor contribution matrix
stuff$draws <- FA2draws(sefa, nsim = nsim) # draws from sampling distribution

if(require(nFactors)) screeplot(sefa)  # Enhanced scree plot
profile(sefa) # profile plots of non-free parameters
pairs(sefa) # Thurstone-style plot
if(require(Rgraphviz)) plot(sefa) # DAG

FAiR documentation built on May 29, 2017, 6:08 p.m.