GenerateBoxData: Generate Thurstone's Box Data From length, width, and height...

View source: R/GenerateBoxData.R

GenerateBoxDataR Documentation

Generate Thurstone's Box Data From length, width, and height box measurements


Generate data for Thurstone's 20 variable and 26 variable Box Study From length, width, and height box measurements.


  BoxStudy = 20,
  Reliability = 0.75,
  ModApproxErrVar = 0.1,
  SampleSize = NULL,
  NMinorFac = 50,
  epsTKL = 0.2,
  Seed = 1,
  SeedErrorFactors = 2,
  SeedMinorFactors = 3,
  LBVal = 1,
  Constant = 0



(Matrix) Length, width, and height measurements for N boxes. The Amazon Box data can be accessed by calling data(AmxBoxes). The Thurstone Box data (20 hypothetical boxes) can be accessed by calling data(Thurstone20Boxes).


(Integer) If BoxStudy = 20 then data will be generated for Thurstone's classic 20 variable box problem. If BoxStudy = 26 then data will be generated for Thurstone's 26 variable box problem. Default: BoxStudy = 20.


(Scalar [0, 1] ) The common reliability value for each measured variable. Default: Reliability = .75.


(Scalar [0, 1] ) The proportion of reliable variance (for each variable) that is due to all minor common factors. Thus, if x (i.e., error free length) has variance var(x) and ModApproxErrVar = .10, then var( + = .10.


(Integer) Specifies the number of boxes to be sampled from the population. If SampleSize = NULL then measurements will be generated for the original input box sizes.


(Integer) The number of minor factors to use while generating model approximation error. Default: NMinorFac = 50.


(Numeric [0, 1]) A parameter of the Tucker, Koopman, and Linn (1969) algorithm that controls the spread of the influence of the minor factors. Default: epsTKL = .20.


(Integer) Starting seed for box sampling.


(Integer) Starting seed for the error-factor scores.


(Integer) Starting seed for the minor common-factor scores.


(Logical) If PRINT = TRUE then the computed reliabilites will be printed. Default: PRINT = FALSE. Setting PRINT to TRUE can be useful when LB = TRUE.


(lower bound; logical) If LB = TRUE then minimum box measurements will be set to LBVal (inches) if they fall below 0 after adding measurement error. If LB = FALSE then negative attribute values will not be modified. This argument has no effect on data that include model approximation error.


(Numeric) If LB = TRUE then values in BoxDataE will be bounded from below at LBVal. This can be used to avoid negative or very small box measurements.


(Numeric) Optional value to add to all box measurements. Default: Constant = 0.


This function can be used with the Amazon boxes dataset (data(AmzBoxes)) or with any collection of user-supplied scores on three variables. The Amazon Boxes data were downloaded from the BoxDimensions website: ( These data contain length (x), width (y), and height (z) measurements for 98 Amazon shipping boxes. In his classical monograph on Multiple Factor Analysis (Thurstone, 1947) Thurstone describes two data sets (one that he created from fictitious data and a second data set that he created from actual box measurements) that were used to illustrate topics in factor analysis. The first (fictitious) data set is known as the Thurstone Box problem (see Kaiser and Horst, 1975). To create his data for the Box problem, Thurstone constructed 20 nonlinear combinations of fictitious length, width, and height measurements. Box20 variables:

  1. x^2

  2. y^2

  3. z^2

  4. xy

  5. xz

  6. yz

  7. sqrt(x^2 + y^2)

  8. sqrt(x^2 + z^2)

  9. sqrt(y^2 + z^2)

  10. 2x + 2y

  11. 2x + 2z

  12. 2y + 2z

  13. log(x)

  14. log(y)

  15. log(z)

  16. xyz

  17. sqrt(x^2 + y^2 + z^2)

  18. exp(x)

  19. exp(y)

  20. exp(z)

The second Thurstone Box problem contains measurements on the following 26 functions of length, width, and height. Box26 variables:

  1. x

  2. y

  3. z

  4. xy

  5. xz

  6. yz

  7. x^2 * y

  8. x * y^2

  9. x^2 * z

  10. x * z^ 2

  11. y^2 * z

  12. y * z^2

  13. x/y

  14. y/x

  15. x/z

  16. z/x

  17. y/z

  18. z/y

  19. 2x + 2y

  20. 2x + 2z

  21. 2y + 2z

  22. sqrt(x^2 + y^2)

  23. sqrt(x^2 + z^2)

  24. sqrt(y^2 + z^2)

  25. xyz

  26. sqrt(x^2 + y^2 + z^2)

Note that when generating unreliable data (i.e., variables with reliability values less than 1) and/or data with model error, SampleSize must be greater than NMinorFac.


  • XYZ The length (x), width (y), and height (z) measurements for the sampled boxes. If SampleSize = NULL then XYZ contains the x, y, z values for the original 98 boxes.

  • BoxData Error free box measurements.

  • BoxDataE Box data with added measurement error.

  • BoxDataEME Box data with added (reliable) model approximation and (unreliable) measurement error.

  • Rel.E Classical reliabilities for the scores in BoxDataE.

  • Rel.EME Classical reliabilities for the scores in BoxDataEME.

  • NMinorFac Number of minor common factors used to generate BoxDataEME.

  • epsTKL Minor factor spread parameter for the Tucker, Koopman, Linn algorithm.

  • SeedErrorFactors Starting seed for the error-factor scores.

  • SeedMinorFactors Starting seed for the minor common-factor scores.


Niels G. Waller (


Cureton, E. E. & Mulaik, S. A. (1975). The weighted varimax rotation and the promax rotation. Psychometrika, 40(2), 183-195. Kaiser, H. F. and Horst, P. (1975). A score matrix for Thurstone's box problem. Multivariate Behavioral Research, 10(1), 17-26.

Thurstone, L. L. (1947). Multiple Factor Analysis. Chicago: University of Chicago Press.

Tucker, L. R., Koopman, R. F., and Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34(4), 421-459.

See Also

Other Factor Analysis Routines: BiFAD(), Box26, Ledermann(), SLi(), SchmidLeiman(), faAlign(), faEKC(), faIB(), faLocalMin(), faMB(), faMain(), faScores(), faSort(), faStandardize(), faX(), fals(), fapa(), fareg(), fsIndeterminacy(), orderFactors(), print.faMB(), print.faMain(), promaxQ(), summary.faMB(), summary.faMain()


  BoxList <- GenerateBoxData (XYZ = AmzBoxes[,2:4],
                              BoxStudy = 20,  
                              Reliability = .75,
                              ModApproxErrVar = .10,
                              SampleSize = 300, 
                              NMinorFac = 50,
                              epsTKL = .20,
                              Seed = 1,
                              SeedErrorFactors = 1,
                              SeedMinorFactors = 2,
                              PRINT = FALSE,
                              LB = FALSE,
                              LBVal = 1,
                              Constant = 0)
   BoxData <- BoxList$BoxData
   RBoxes <- cor(BoxData)
   fout <- faMain(R = RBoxes,
                 numFactors = 3,
                 facMethod = "fals",
                 rotate = "geominQ",
                 rotateControl = list(numberStarts = 100,
                                      standardize = "CM")) 

fungible documentation built on March 31, 2023, 5:47 p.m.