Item Pool Visualization"

library(IPV)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

About this package {#about}

The IPV package is a tool to create IPV charts. The original work on IPV, including the chart concepts, can be found in: Dantlgraber, M., Stieger, S., & Reips, U. D. (2019). Introducing Item Pool Visualization: A method for investigation of concepts in self-reports and psychometric tests. Methodological Innovations, 12(3), 2059799119884283.. Please cite this paper when using the package. My philosophy is that chart creation should be possible on the full spectrum between quick & dirty and highly customized. So feel free to raise an issue on github if something frustrates you, or you miss a customization option. Do the same if you encounter a bug (so I can fix it) or do not understand the vignette (so I can change it).

Quick & dirty chart creation {#quick}

Let us imagine for a second, that estimated SEM parameters are already present in the correct format in an object called "DSSEI":

mychart <- item_chart(data = DSSEI)
mychart

As you can see, getting a first result is meant to be trivial. All three IPV chart types can be created by specifying data = only. In most practical cases, I strongly recommend to specify a file_name =. Results viewed within R (or RStudio) may differ from the file output in multiple ways, including quality. Here, display quality is heavily compromised to generate small files.

Although I have put considerable effort in setting sensible defaults (or generating them dynamically, based on the data), it is always possible to optimize the charts. For example, I prefer my charts to have a higher data ink ratio and like to use color to guide attention. The many graphical parameters may also enable you to create a black and white print optimized version or a version optimized for small display.

Workflow

Assuming you have some interesting raw data, empirical, simulated, or faked, these are the necessary steps to produce IPV charts:
0. SEM estimation
1. Input
2. Chart creation
Tip: Call ?IPV for an overview of the functions of the package.

0. SEM estimation

The IPV package does not do the SEM estimation for you. But that means you get to use your favorite statistical software (e.g. the R packages lavaan or sem). This paper shows how to specify the SEM models. For the example, used here and in the paper, you might want to take a look at figures 6-8, which shows the three SEMs used. As the authors point out, the models can only be estimated as SEMs, not by means of factor analysis or in any other way.
As nested I will describe all cases, where the overall item pool is subdivided more than once. For example, an item pool comprising multiple tests can be divided into the tests and subdivided into the tests' facets (as seen in figure 6-8 of the paper). In this case, nested charts can be used to compare all three models. In nested cases, I will distinguish a global level, where tests are compared with each other, and a nested level, that is concerned with the internal structure of tests. Every case that is not nested, i.e. only consists of a single test, I will call simple.

1. Input

To use the chart functions, the SEM estimation results need to be formatted in a specific way. This is an example for the data format within the IPV package:

self_confidence$tests$RSES

Note, that factor refers to an item pool, that was divided into subpools (subfactors). In this case, factor refers to the Rosenberg Self-Esteem Scale, a test with two facets: Positive Self-Esteem (Ps), and Negative Self-Esteem (Ns), which was reversed here. As seen below, the same data structure applies on the global level, with factor referring to the overall self-confidence item pool, comprising the three tests (subfactors) RSES, SMTQ, and DSSEI.

self_confidence$global

cd is short for center distance
\begin{equation} cd_i = \frac{\lambda^2_{is}}{\lambda^2_{ig}} -1 \end{equation}
while mean_cd is the mean center distance of the items of a facet or test. Furthermore, the matrix of latent correlations between subfactors is given as a second item of the list. To see how the data is combined for nested cases, load the package and call the example object self_confidence.
To spare you the task of creating this data structure by hand, I implemented two automated input pathways. You can either use excel files or the manual input function. In both cases, center distances are calculated automatically and the data is automatically checked for (obvious) errors. Negative center distances are always set to zero before mean center distances are calculated. Regardless of the input mode, you will need to provide:

Using manual input functions

These functions allow you to reduce the manual work to a minimum. They are especially useful, when your SEM estimates are already in your R environment (e.g. because you read them from a .csv file). The functions input_manual_nested() and input_manual_simple() allow you to feed in factor loadings, item names, etc. variable by variable. The correct format is then generated automatically. Run input_manual_process() on the result, to automatically calculate center distances.
This is an example, where all values have been put in individually for demonstration:

mydata <- input_manual_simple(
test_name = "RSES",
facet_names = c("Ns", "Ps"),
items_per_facet = 5,
item_names = c(2, 5, 6, 8, 9,
               1, 3, 4, 7, 10),
test_loadings = c(.5806, .5907, .6179, .5899, .6559,
                  .6005, .4932, .4476, .5033, .6431),
facet_loadings = c(.6484, .6011, .6988, .6426, .6914,
                   .6422, .5835, .536, .5836, .6791),
correlation_matrix = matrix(data = c(1, .69,
                                     .69, 1),
                            nrow = 2,
                            ncol = 2))
mydata
input_manual_process(mydata)

For nested cases, use the function input_manual_nested(), and add the individual tests using input_manual_simple(). Then you can run input_manual_processs() as in the simple case. You can find the (lengthy) example below.
If any factor loading is below .1 or any center distance below 0, it is set to that value and a warning (or message) is displayed. IPV does not allow negative factor loadings, which is indicated by an error. In this case, recode your data appropriately.

Using Excel files

Excel files have the advantage that you can simply copy and paste your SEM estimates into the spreadsheets and the input function of the IPV package (input_excel()) does the rest. The files need to be structured as in the example, that you can find here:

system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_RSES.xlsx", package = "IPV", mustWork = TRUE)

As you can see, there is a file for each test, and a global file. You might want to use a copy as your template, so you can just fill in your values. Open a file to see how it works.

On sheet 1 you need to provide the factor loadings from your SEM estimation results, on sheet 2 you need to provide the named and complete latent correlation matrix. On sheet 1, "factor" contains a single factor name and "factor_loading" the factor loadings of items on that factor (not squared). "subfactor" contains the names of grouped factors and "subfactor_loading" the factor loadings of items on these factors (not squared). "item" contains the item names. Therefore, each row contains the full information on the respective item.
Read these excel sheets using input_excel. In the example:

global <- system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
tests <- c(system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_RSES.xlsx", package = "IPV", mustWork = TRUE))
mydata <- input_excel(global = global, tests = tests)

The data will be prepared automatically, including the calculation of center distances. If any factor loading is below .1 or any center distance below 0, it is set to that value and a warning or message is displayed. IPV does not allow negative factor loadings, which is indicated by an error. In this case, recode your data appropriately.

Facetless tests

In nested charts, tests do not need to have facets. If you use input by excel, use NA instead of providing a file name.

global <- system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
tests <- c(system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE),
           NA)
mydata <- input_excel(global = global, tests = tests)

If you use manual input, do not provide data on facetless tests with input_manual_simple(). Any further treatment of facetless tests is handled automatically.

2. Chart creation

As briefly outlined above, creating any chart from the formatted data is trivial. There are a few general things to consider, after which I will go into some detail on the three chart types.

Generally, best results can be achieved using .pdf files, since they are vector based. .pdf files can be zoomed and scaled indefinitely without loss of quality. Pixelated formats (.png and .jpeg are supported) will lead to lower quality results and are not scaleable (as can be seen in this vignette). The parameters file_width = and file_length = determine, how large the .pdf file will be, measured in inches (1 in = 2.54 cm). The size of .png or .jpeg files in pixels is determined by multiplying the size in inches with the dots per inch parameter value (dpi =). In this vignette the dpi is 72, resulting in heavily pixelated graphics and small file sizes. To inspect your results within RStudio, always use the zoom pop-out of the Plots window, otherwise charts may be heavily distorted. Furthermore, I strongly recommend inspecting the graphics file itself.

Most graphical parameters are size parameters for single elements of the chart, all named size_... =, width_... =, or length_... =. Those are pretty straightforward: linear scaling parameters defaulting to 1. That means, .5 will half the size and 2 will double it. For all chart types, there is also a global size = parameter, scaling all elements of the chart at once. Use this parameter first, before you fine tune single elements.

For all chart types, it is possible to rotate the whole chart, using rotate_radians = or rotate_degrees =. Also, the font can be changed, using the font = parameter. I recommend using the package extrafont for access to more fonts.

As you will see, repeatedly trying changes and inspecting the results is necessary to generate the best possible chart. Nevertheless, the final result will be a single, relatively simple function call. I strongly recommend saving your analysis and chart creation functions in a script, so you can always reproduce the results or make changes. If you use rmarkdown or sweave to create manuscripts or reports directly from R, changes can easily be made in the scripts and adopted downstream with a single click. Therefore, do not rely on your saved graphics files.

In the following, I will go into detail on the individual chart types. Using the example, I will show the important customization options provided in the chart functions:

Item charts

Above, we already saw a first example of an item chart. However, I stated, that I would like to improve the data ink ratio and use color for visual guidance. To reduce the visibility of structural elements, the fade_... = parameters can be used (0 = "black", 100 = "white").

mychart <- item_chart(data = DSSEI,
                      color = "darkblue", color2 = "darkblue",
                      fade_axes = 70, fade_grid_major = 50, fade_grid_minor = 92)
mychart

To further accentuate the data, let us change some sizes.

mychart <- item_chart(data = DSSEI,
                      color = "darkblue", color2 = "darkblue",
                      fade_axes = 70, fade_grid_major = 50, fade_grid_minor = 92,
                      size = 1.3, width_items = 1.5, length_items = 1.5, width_grid = .6, size_tick_label = .6)
mychart

As you might have noticed, some bars, representing items, are overlapping. This problem is already attenuated, by cropping every other item bar a bit. An alternative is to use different colors or making the item bars slimmer:

mychart <- item_chart(data = DSSEI,
                      color = "darkblue", color2 = "darkred",
                      fade_axes = 70, fade_grid_major = 50, fade_grid_minor = 92,
                      size = 1.3, length_items = 2.5, width_grid = .6, size_tick_label = .6,
                      length_ratio_items = 1, width_items = .9)
mychart

A special feature for item charts is the dodge = parameter, that allows long facet labels to dodge the rest of the chart horizontally:

x <- DSSEI
colnames(x$cors)[4] <- "Oachkatzlschwoaf"
rownames(x$cors)[4] <- "Oachkatzlschwoaf"
levels(x$cds$subfactor) <- c("Ab", "Pb", "Ph", "Oachkatzlschwoaf")
x$cds$subfactor[16:20] <- "Oachkatzlschwoaf"
mychart1 <- item_chart(data = x)
mychart2 <- item_chart(data = x, dodge = 7)
mychart1
mychart2

This works simultaneously for all labels. Labels at the top and bottom do not move, labels on the right and left move the most.

Facet charts

Facet charts, as created by facet_chart(), can be optimized similarly to item charts, using fade = and color =. Furthermore, there are some specific considerations.

mychart <- facet_chart(data = DSSEI)
mychart

As you can see in the output, two parameter values (subradius = , and tick = ) have been generated automatically. These need to fit the data.

The subradius = parameter is important to optimize the appearance. The radius of the facet circles has no meaning. It should be chosen large enough to make the facet labels and correlations readable. But it should also be small enough to make the center distances (thick lines) dominate the first impression and to avoid overlapping facet circles.

For a simplistic version of the chart, the correlations can be omitted.

mychart <- facet_chart(data = DSSEI,
                      cor_labels = FALSE)
mychart

In this case, it might be more visually pleasing, to rotate the test label to the top left, and add some color:

mychart <- facet_chart(data = DSSEI,
                      rotate_test_label_radians = pi, color = "firebrick4")
mychart

Nested charts

Nested charts are the most complex IPV charts. In addition to what I mentioned earlier, there are four important considerations: the relative_scaling = of the global and the nested level, the addition of xarrows = to display correlation arrows between facets of different tests, the ability to subrotate_... = each test individually, and the cor_spacing = to display correlations between the tests. Due to the complexity one should not rely on default values too much:

mychart <- nested_chart(data = self_confidence)
mychart

The relative_scaling = should be large enough to have the center distances on the global level shape the overall impression. But a large value for the relative_scaling = makes the nested facet charts of each test small, which should be readable. Note, that the axis scaling within the nested facet charts is different to the global axis scaling by exactly the factor of relative_scaling = , as can be seen from the axis tick marks (small dotted circles). In this particular case, the relative_scaling = seems sound to me, but the facet circles could be larger, including the font sizes within:

mychart <- nested_chart(data = self_confidence,
                        subradius = .5, size_facet_labels = 2, size_cor_labels_inner = 1.5)
mychart

(As you might note, the dynamic default for relative_scaling = adapted to the changes, because the test circles became larger, due to the changes to subradius = .)

The addition of correlation arrows between facets of different tests is indicated by the IPV authors as sensible, when the correlation between these facets exceed the correlation between the respective tests. In the current example, this would result in three extra arrows, that can be added as follows:

sc_arrows <- data.frame(test1 = rep(NA, 3), facet1 = NA,
                       test2 = NA, facet2 = NA,
                       value = NA)
sc_arrows[1, ] <- c("DSSEI", "Ab", "RSES", "Ps", ".67")
sc_arrows[2, ] <- c("DSSEI", "Ab", "SMTQ", "Cs", ".81")
sc_arrows[3, ] <- c("SMTQ", "Ct", "RSES", "Ns", ".76")
sc_arrows

mychart <- nested_chart(data = self_confidence,
                        subradius = .5, size_facet_labels = 2, size_cor_labels_inner = 1.5,
                        xarrows = sc_arrows, show_xarrows = TRUE)
mychart

The data frame that indicates the names of the facets to connect and the correlation values (here called sc_arrows ) needs to be set up with the column names as in the example.

Now the arrows create a lot of overlap and make the chart look messy. This problem can be solved by rotating each of the nested facet charts, so the facets connected by arrows are oriented towards the center. Also the construct label should be moved out of harms way, as well as the test label of the SMTQ.

mychart <- nested_chart(data = self_confidence,
                        subradius = .5, size_facet_labels = 2, size_cor_labels_inner = 1.5,
                        xarrows = sc_arrows, show_xarrows = TRUE,
                        subrotate_degrees = c(180, 270, 90), dist_construct_label = .7,
                        rotate_test_labels_degrees = c(0, 120, 0))
mychart

The cor_spacing = refers to the ring around the nested facet charts for each test, in which the correlations between the tests are displayed. It should be large enough for the correlation labels, but not too large. If the correlations are omitted, this ring is also omitted:

mychart <- nested_chart(data = self_confidence,
                        subradius = .5, size_facet_labels = 2, size_cor_labels_inner = 1.5,
                        xarrows = sc_arrows, show_xarrows = TRUE,
                        subrotate_degrees = c(180, 270, 90), dist_construct_label = .7,
                        rotate_test_labels_degrees = c(0, 120, 0),
                        cor_labels_tests = FALSE)
mychart

To get a somewhat decent result, let us change some size_... = parameters and add some color. Color can be chosen for the global and the nested level independently. Furthermore, it might be better to increase the line thickness, so the impression of the colored shapes intensifies.

mychart <- nested_chart(data = self_confidence,
                        subradius = .5, size_facet_labels = 2, size_cor_labels_inner = 1.5,
                        xarrows = sc_arrows, show_xarrows = TRUE,
                        subrotate_degrees = c(180, 270, 90), dist_construct_label = .7,
                        rotate_construct_label_degrees = -15,
                        rotate_test_labels_degrees = c(0, 120, 0),
                        color_global = "cyan4", color_nested = "darkblue",
                        size_construct_label = 1.3, size_test_labels = 1.2,
                        width_circles_inner = 1.5, width_circles = 1.5, width_axes_inner = 1.5, width_axes = 1.5)
mychart

Appendix

manual input in nested cases - example {#nested}

# first the global level
mydata <- input_manual_nested(
  construct_name = "Self-Confidence",
  test_names = c("DSSEI", "SMTQ", "RSES"),
  items_per_test = c(20, 14, 10),
  item_names = c(
     1,  5,  9, 13, 17, # DSSEI
     3,  7, 11, 15, 19, # DSSEI
    16,  4, 12,  8, 20, # DSSEI
     2,  6, 10, 14, 18, # DSSEI
    11, 13, 14,  1,  5,  6, # SMTQ
     3, 10, 12,  8, # SMTQ
     7,  2,  4,  9, # SMTQ
     1,  3,  4,  7, 10, # RSES
     2,  5,  6,  8,  9), # RSES
  construct_loadings = c(
    .5189, .6055, .618 , .4074, .4442,
    .5203, .2479, .529 , .554 , .5144,
    .3958, .5671, .5559, .4591, .4927,
    .3713, .5941, .4903, .5998, .6616,
    .4182, .2504, .4094, .3977, .5177, .4603,
    .3271, .261 , .3614, .4226,
    .2076, .3375, .5509, .3495,
    .5482, .4627, .4185, .4185, .5319,
    .4548, .4773, .4604, .4657, .4986),
  test_loadings = c(
    .5694, .6794, .6615, .4142, .4584, # DSSEI
    .5554, .2165, .5675, .5649, .4752, # DSSEI
    .443 , .6517, .6421, .545 , .5266, # DSSEI
    .302 , .6067, .5178, .5878, .6572, # DSSEI
    .4486, .3282, .4738, .4567, .5986, .5416, # SMTQ
    .3602, .2955, .3648, .4814, # SMTQ
    .2593, .4053, .61  , .4121, # SMTQ
    .6005, .4932, .4476, .5033, .6431, # RSES
    .5806, .5907, .6179, .5899, .6559), # RSES
  correlation_matrix = matrix(data = c(  1, .73, .62,
                                         .73,   1, .75,
                                         .62, .75,   1),
                              nrow = 3,
                              ncol = 3))

# then add tests individually
# test 1
mydata$tests$RSES <- input_manual_simple(
  test_name = "RSES",
  facet_names = c("Ns", "Ps"),
  items_per_facet = c(5, 5),
  item_names = c(2, 5, 6, 8,  9,
                 1, 3, 4, 7, 10),
  test_loadings = c(.5806, .5907, .6179, .5899, .6559,
                    .6005, .4932, .4476, .5033, .6431),
  facet_loadings = c(.6484, .6011, .6988, .6426, .6914,
                     .6422, .5835, .536, .5836, .6791),
  correlation_matrix = matrix(data = c(1, .69,
                                       .69, 1),
                              nrow = 2,
                              ncol = 2))
# test 2
mydata$tests$DSSEI <- input_manual_simple(
  test_name = "DSSEI",
  facet_names = c("Ab", "Pb", "Ph", "So"),
  items_per_facet = 5,
  item_names = c(2, 6, 10, 14, 18,
                  16, 4, 12, 8, 20,
                  3, 7, 11, 15, 19,
                  1, 5, 9, 13, 17),
  test_loadings = c(.302 , .6067, .5178, .5878, .6572,
                    .443 , .6517, .6421, .545 , .5266,
                    .5554, .2165, .5675, .5649, .4752,
                    .5694, .6794, .6615, .4142, .4584),
  facet_loadings = c(.3347, .6537, .6078, .684 , .735 ,
                     .6861, .8746, .7982, .7521, .6794,
                     .7947, .3737, .819 , .7099, .5785,
                     .7293, .8284, .7892, .3101, .4384),
  correlation_matrix = matrix(data = c(1, .49, .66, .76,
                                       .49, 1, .37, .54,
                                       .66, .37, 1, .53,
                                       .76, .54, .53,   1),
                              nrow = 4,
                              ncol = 4))
# test 3
mydata$tests$SMTQ <- input_manual_simple(
  test_name = "SMTQ",
  facet_names = c("Cf", "Cs", "Ct"),
  items_per_facet = c(6, 4, 4),
  item_names = c(11, 13, 14, 1, 5, 6,
                 3, 10, 12, 8,
                 7, 2, 4, 9),
  test_loadings = c(.4486, .3282, .4738, .4567, .5986, .5416,
                    .3602, .2955, .3648, .4814,
                    .2593, .4053, .61  , .4121),
  facet_loadings = c(.4995, .3843, .5399, .4562, .6174, .6265,
                     .4601, .3766, .4744, .5255,
                     .3546, .5038, .7429, .4342),
  correlation_matrix = matrix(data = c(1,   .71, .62,
                                       .71, 1, .59,
                                       .62, .59,    1),
                              nrow = 3,
                              ncol = 3))

# finally process (as in a simple case)
my_processed_data <- input_manual_process(mydata)
my_processed_data


Try the IPV package in your browser

Any scripts or data that you put into this service are public.

IPV documentation built on March 13, 2020, 3:03 a.m.