library(IPV)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.dpi = 96
)

About this package

The IPV package is a tool to create IPV charts. The original work on IPV, including the chart concepts, can be found in: Dantlgraber, M., Stieger, S., & Reips, U. D. (2019). Introducing Item Pool Visualization: A method for investigation of concepts in self-reports and psychometric tests. Methodological Innovations, 12(3), 2059799119884283.. Please cite this paper, as well as this package (Petras, N., & Dantlgraber, M. (2022). IPV: Item Pool visualization. R Package Version 1.0.0) when using the package. The package is built to enable chart creation -- quick & dirty or highly customized. Please raise an issue on github in case of bugs or to make suggestions.

Example data

The package contains two sets of example data:

  1. The model estimates from the original IPV paper ?self_confidence (also available as excel spreadsheets, see Appendix)
  2. Raw data from the Open-Source Psychometrics Project ?HEXACO

Terminology within the package

IPV uses the concept of item pools, that can be divided into smaller item (sub-)pools. The hierarchy of terms for item pools in this package is "construct" > "test" > "facet", from large to small. The total item pool of a (psychological) construct may comprise multiple tests, which in turn may comprise multiple facets. This terminology is based on the common data structure in psychological assessment. But IPV can technically be used for any subdivision of an item pool, if multiple levels of specificity exist (see Dantlgraber et al., 2019). In other words, what is called "test" here does not need to be an actual test, etc.

Recommended Workflow

  1. Prepare your raw data.
  2. Generate the model estimates using ipv_est().
  3. Select a chart function and use it with the estimates, a file name, and otherwise default values.
  4. Customize the appearance using further arguments of the chart function.
  5. Check the chart's appearance by opening the created file.
  6. Repeat 4. + 5. until you are satisfied with the result.

Minimal example:

mydata <- HEXACO[ ,c(2:41, 122:161)] # (1.) HEXACO is a data frame containing raw data
res <- ipv_est(mydata, name = "HA") # (2.) produce a formatted bundle of estimates to use
nested_chart(res, file_name = "test.pdf") # (3.) create a chart with default formatting

There are some additional helper functions, such as the input_... functions, which enable you to format already existing model estimates.

Call ?IPV for an overview of all functions.

1. Prepare data

The ipv_est() function is recommended for model estimation, because it only needs raw data. It automatically estimates the models underlying IPV charts. Its output can be plugged into the chart functions directly.

To use this function, your data needs to be formatted as follows:

For example:

HEXACO_long <- reshape2::melt(cbind(id = row.names(HEXACO), HEXACO[ ,1:240]), id.vars = "id")
HEXACO_long$test <- substr(HEXACO_long$variable, 1, 1)
HEXACO_long$facet <- substr(HEXACO_long$variable, 3, 6)
HEXACO_long$item <- substr(HEXACO_long$variable, 8, 13)
HEXACO_long$variable <- NULL
head(HEXACO_long)

For example:

HEXACO[1:3, 2:4]

H = honesty/humility ("test")

Sinc = sincerity (a "facet" of honesty/humility)

Sinc1 = first "item" of the sincerity facet

2. Generate model estimates

ipv_est() automatically determines if the data are in wide or long format, which models to estimate, uses the lavaan package to estimate them, and returns an object of class "IPV" with formatted results.

# nested case: honesty/humility and agreeableness as "tests" (= sub-pools)
# of an overarching "construct" (= item pool)
res_HA <- ipv_est(dat = HEXACO[ ,c(2:41, 122:161)], name = "HA")
# simple case: agreeableness only
res_A <- ipv_est(dat = HEXACO[ ,c(122:161)], name = "A")

If any factor loading is below .1 or any center distance below 0, it is set to that value and a warning (or message) is displayed. IPV does not allow negative factor loadings, which is indicated by an error. In this case, check if you have properly recoded negatively keyed items. If there remain negative factor loadings, the data does not fulfill the requirements of IPV (Dantlgraber et al., 2019).

Results are provided as an object of the class "IPV", which is a list containing (some of) the following elements:

You should always inspect the estimated models ("lav") to understand what is going on with your data. Use the lavaan functions (e.g. lavaan::fitmeasures, lavaan::summary, lavaan::lavInspect) for this purpose.

A more thorough introduction to the formatting of the objects representing model estimates can be found in the section on the alternative workflow in the Appendix.

3. Create and save IPV charts

Use the chart functions (item_chart, facet_chart, nested_chart) to create an IPV chart from your data.

All three IPV chart types can be created by specifying data = only:

mychart <- item_chart(data = self_confidence)
mychart

If the test = argument is unspecified, the first test will automatically be chosen.

It is advisable to use vector-based .pdf graphics output using the file_name = argument:

mychart <- item_chart(data = self_confidence, test = "DSSEI", file_name = "DSSEI_item_chart.pdf")

.pdf files can be zoomed and scaled indefinitely without loss of quality. If you need .png or .jpeg graphics output, use the dpi = argument to find a balance between quality and file size. The file format provided to file_name = will be used. By default, no file is saved. Saving manually is not recommended.

It is advisable to inspect the graphics file itself. If you inspect your results within RStudio, always use the zoom pop-out of the Plots window, otherwise charts may be heavily distorted.

4. Customize IPV charts

Although some graphical options are automatically optimized based on the data, there are many ways to optimize the charts' appearance (e.g. for print). For example you may use a different colour to guide attention, or re-size elements of the chart for readability.

4.1 center distance method

The parameter cd_method = allows to switch between aggregation methods for center distances in facet charts and nested charts. There are two options:

"aggregate" computes the sum of the squared loadings of all items first and in a second step computes the center distance based on the aggregated values. This is the recommended option and the default. It is more meaningful in cases with heterogeneous factor loadings across items.

"mean" first computes the center distance of each item and in a second step computes the mean of these center distances. This is the originally proposed option (Dantlgraber et al., 2019). It can be intuitively derived from item charts.

4.2. Variable labels

The function relabel() can be used to change any item, facet, test, or construct name, using a before-after syntax (see documentation).

4.3. Size of chart elements {#size}

Most graphical parameters are size parameters for single elements of the chart, all named size_... =, width_... =, or length_... =. Those are pretty straightforward: linear scaling parameters defaulting to 1. That means, .5 will half the size and 2 will double it. For all chart types, there is also a global size = parameter, scaling all elements of the chart at once. Use this parameter first, before you fine tune single elements.

4.4 Output file properties

The parameters file_width = and file_height = determine, how large the .pdf file will be, measured in inches (1 in = 2.54 cm). The size of .png or .jpeg files in pixels is determined by multiplying the size in inches with the dots per inch parameter value (dpi =).

4.5 Showing sections of the chart ("Screenshots")

In some cases, it may be desirable to cut the chart to the informative area for better readability, or to zoom in on a certain section of the chart. It is possible to create vector-based PDF versions of sections of the chart. The parameters zoom_x = and zoom_y = can be used to crop the chart to a certain area by providing lower and upper limits on both dimensions. The aspect ratio is automatically retained if both parameters are used. This method is superior to a manual screenshot, because it retains maximum graphics quality and is exactly reproducible.

4.6. Rotation and order

For all chart types, it is possible to rotate the whole chart, using rotate_radians = (use fractions of $2\pi$) or rotate_degrees = (use 0-360). In all charts, the parameter facet_order = enables you to set the (counter-clockwise) order of facets. For example:

item_chart(self_confidence, test = "DSSEI", facet_order = c("Ab", "So", "Ph", "Pb"))

In nested charts, test_order = and subrotate_degrees =/subrotate_radians = will function likewise. Several rotate_... = parameters can be used to re-position single elements of the chart relative to the origin.

4.7. Font

The font can be changed using the font = parameter. The package extrafont allows access to many more fonts. For changes to the font size, see section 4.3.

4.8. Structural elements

To reduce the visibility of structural elements, the fade_... = parameters can be used (0 = "black", 100 = "white").

4.9. Colour

For the use of colours and gray tones, see this guide. All charts functions have at least one colour parameter to increase visual contrast.

4.10 Additional plot layers

The chart functions return a ggplot2 object, which can easily be combined with other geoms or entire plots. The following example uses this to show two facet charts side by side at the same scale:

library(ggplot2)
library(cowplot)
x <- facet_chart(self_confidence) +
  coord_fixed(
    ratio = 1,
    ylim = c(-3, 3),
    xlim = c(-3, 3))
y <- facet_chart(self_confidence, test = "RSES") +
  coord_fixed(
    ratio = 1,
    ylim = c(-3, 3),
    xlim = c(-3, 3))
# Save just as any other ggplot
ggsave(filename = "test.pdf",
       plot = plot_grid(plotlist = list(x, y), align = "h"),
       width = 20, height = 10) # defaults are optimized for 10x10 inches per chart

4.11. Special options for item charts

You can use different colors for every second item to increase readability of item charts:

mychart <- item_chart(
  data = self_confidence, test = "DSSEI",
  color = "darkblue", color2 = "darkred")
mychart

The dodge = parameter allows long facet labels to dodge the rest of the chart horizontally:

x <- self_confidence
x <- relabel(x, "So", "verylongname")
mychart1 <- item_chart(data = x, test = "DSSEI")
mychart2 <- item_chart(data = x, test = "DSSEI", dodge = 7)
mychart1
mychart2

This works simultaneously for all labels. Labels at the top and bottom do not move, labels on the right and left move the most.

4.12. Special options for facet charts

mychart <- facet_chart(data = self_confidence, test = "DSSEI")
mychart

As you can see in the output (and the message in the console), two parameter values (subradius = , and tick = ) have been generated automatically. These can be used to improve readability. subradius = sets the size of the facet circles and tick = sets the value at which a tick mark should be displayed.

For a simplistic version of the chart, the correlations can be omitted, by setting cor_labels = FALSE. It is also possible to omit the tick marks emphasizing the center distances by setting size_marker = 0.

mychart <- facet_chart(
  data = self_confidence,
  test = "DSSEI",
  cor_labels = FALSE,
  size_marker = 0)
mychart

4.13. Special options for nested charts

Due to the complexity one should not rely on default values too much:

mychart <- nested_chart(data = self_confidence)
mychart

There are four important options specific to nested charts: relative_scaling =, xarrows =, subrotate_... =, and cor_spacing =

The axis scaling within the nested facet charts is potentially different to the global axis scaling, by exactly the factor of relative_scaling =. This can be seen from the axis tick marks (small dotted circles), which always indicate the same value. relative_scaling = 1 is desirable for intuitive understanding. Nevertheless, relative_scaling = should be large enough to avoid circle overlap.

# all axes have the same scaling
nested_chart(self_confidence, relative_scaling = 1, tick = 0.2, rotate_tick_label = -.2)
# the global axis is twice as large (see dotted circles)
nested_chart(self_confidence, relative_scaling = 2, tick = 0.2, rotate_tick_label = -.2)

In this particular case, the facet circles could be larger, including the font sizes within:

mychart <- nested_chart(
  data = self_confidence,
  subradius = .5,
  size_facet_labels = 2,
  size_cor_labels_inner = 1.5)
mychart

Note how the dynamic default for relative_scaling = adapted to the changes, because the test circles became larger, due to the changes to subradius = .

The addition of correlation arrows between facets of different tests is indicated by the IPV authors as sensible, when the correlation between these facets exceed the correlation between the respective tests. The three arrows in the current example can also be omitted in the display:

mychart <- nested_chart(
  data = self_confidence,
  subradius = .5,
  size_facet_labels = 2,
  size_cor_labels_inner = 1.5,
  xarrows = FALSE)
mychart

The column names of the data frame providing the data for the arrows need to match the example.

Note that ipv_est() generates the data frame for these arrows automatically.

The arrows create a lot of overlap and make the chart look messy. This problem can be solved by rotating each of the nested facet charts, so the facets connected by arrows are oriented towards the center. Furthermore, the construct label should be moved out of harms way, as well as the test label of the SMTQ.

mychart <- nested_chart(
  data = self_confidence,
  subradius = .5,
  size_facet_labels = 2,
  size_cor_labels_inner = 1.5,
  subrotate_degrees = c(180, 270, 90),
  dist_construct_label = .7,
  rotate_test_labels_degrees = c(0, 120, 0))
mychart

The cor_spacing = refers to the ring around the nested facet charts for each test, in which the correlations between the tests are displayed. It should be large enough for the correlation labels, but not too large to skew proportions. If the correlations are omitted (cor_labels_tests = FALSE), this ring is also omitted:

mychart <- nested_chart(
  data = self_confidence,
  subradius = .5,
  size_facet_labels = 2,
  size_cor_labels_inner = 1.5,
  subrotate_degrees = c(180, 270, 90),
  dist_construct_label = .7,
  rotate_test_labels_degrees = c(0, 120, 0),
  cor_labels_tests = FALSE)
mychart

Color can be chosen for the global and the nested level independently. Increasing the line thickness makes the colors more visible.

mychart <- nested_chart(
  data = self_confidence,
  subradius = .5,
  size_facet_labels = 2,
  size_cor_labels_inner = 1.5,
  subrotate_degrees = c(180, 270, 90),
  dist_construct_label = .7,
  rotate_construct_label_degrees = -15,
  rotate_test_labels_degrees = c(0, 120, 0),
  size_construct_label = 1.3,
  size_test_labels = 1.2,
  width_circles_inner = 1.2,
  width_circles = 1.2,
  width_axes_inner = 1.2,
  width_axes = 1.2)
mychart

Object structure within the package

This is an example of the data format of the class "IPV", which is required for the use of the chart functions and produced by ipv_est():

str(self_confidence, 2)
self_confidence$tests$RSES

Note, that factor refers to an item pool, that was divided into subpools (subfactors). In this case, factor refers to the Rosenberg Self-Esteem Scale, a test that can be divided in two facets: Positive Self-Esteem (Ps), and Negative Self-Esteem (Ns), which was reversed here. As seen below, the same data structure applies on the global level, with factor referring to the overall self-confidence item pool, comprising the three tests (subfactors) RSES, SMTQ, and DSSEI.

self_confidence$global

cd is short for center distance
\begin{equation} cd_i = \frac{\lambda^2_{is}}{\lambda^2_{ig}} -1 \end{equation}
while mean_cd is the mean center distance of the items of a facet or test. aggregate_cd is the center distance of a facet or test as a whole (see cd_method). Furthermore, the matrix of latent correlations between subfactors is given as a second item of the list.

There is a raw variant ($est_raw in the output of ipv_est) of this, providing factor loadings instead of center distances.

Appendix

2. Read existing estimates (alternative Workflow)

Using ipv_est() will take less effort than estimating the models yourself and reading in the estimates. For this reason, it is recommended. On the other hand, reading in existing estimates provides you with additional options during model estimation, or the ability to use published estimates without access to the raw data.

Regardless of the input mode, you will need to provide:

To spare you the task of formatting estimates by hand, there are two automated input pathways. You can either use excel files or the manual input function. In both cases, center distances are calculated automatically and the estimates are automatically checked for (obvious) errors. Negative center distances are always set to zero before mean center distances are calculated.

... using estimates from within R

These functions allow you to reduce the manual work to a minimum. They are especially useful, when your SEM estimates are already in your R environment (e.g. because you read them from a .csv file). The functions input_manual_nested() and input_manual_simple() allow you to feed in vectors of factor loadings, item names, etc. The correct format for plotting is then generated automatically. Run input_manual_process() on the result, to automatically calculate center distances.
This is an example, where all estimates are put in completely manually for demonstration purposes:

mydata <- input_manual_simple(
test_name = "RSES",
facet_names = c("Ns", "Ps"),
items_per_facet = 5,
item_names = c(
  2, 5, 6, 8, 9,
  1, 3, 4, 7, 10),
test_loadings = c(
  .5806, .5907, .6179, .5899, .6559,
  .6005, .4932, .4476, .5033, .6431),
facet_loadings = c(
  .6484, .6011, .6988, .6426, .6914,
  .6422, .5835, .536, .5836, .6791),
correlation_matrix = matrix(
  data = c(1, .69,
           .69, 1),
  nrow = 2,
  ncol = 2))
mydata
input_manual_process(mydata)

For nested cases, use the function input_manual_nested(), and add the individual tests using input_manual_simple(). Then you can run input_manual_processs() as in the simple case. You can find a (lengthy) example below.

... using Excel files

Excel files have the advantage that you can simply copy and paste your SEM estimates into spreadsheets and the input function of the IPV package (input_excel()) does the rest. The files need to be structured as in the example, that you can find here:

system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE)
system.file("extdata", "IPV_RSES.xlsx", package = "IPV", mustWork = TRUE)

As you can see, there is a file for each test, and a global file. You might want to use a copy as your template, so you can just fill in your values. Open a file to see how it works.

On sheet 1 you need to provide the factor loadings from your SEM estimation results, on sheet 2 you need to provide the named and complete latent correlation matrix. On sheet 1, "factor" contains a single factor name and "factor_loading" the factor loadings of items on that factor (not squared). "subfactor" contains the names of grouped factors and "subfactor_loading" the factor loadings of items on these factors (not squared). "item" contains the item names. Therefore, each row contains the full information on the respective item.
Read these excel sheets using input_excel. In the example:

global <- system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
tests <- c(system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_RSES.xlsx", package = "IPV", mustWork = TRUE))
mydata <- input_excel(global = global, tests = tests)

The data will be prepared automatically, including the calculation of center distances. If any factor loading is below .1 or any center distance below 0, it is set to that value and a warning or message is displayed. IPV does not allow negative factor loadings, which is indicated by an error. If possible, recode your data appropriately.

Mix tests with and without facets

In nested charts, tests do not need to have facets. If you use input by excel, use NA instead of providing a file name.

global <- system.file("extdata", "IPV_global.xlsx", package = "IPV", mustWork = TRUE)
tests <- c(system.file("extdata", "IPV_DSSEI.xlsx", package = "IPV", mustWork = TRUE),
           system.file("extdata", "IPV_SMTQ.xlsx", package = "IPV", mustWork = TRUE),
           NA)
mydata <- input_excel(global = global, tests = tests)

If you use manual input, do not provide data on facetless tests with input_manual_simple(). Any further treatment of facetless tests is handled automatically.

Manual input in nested cases - example {#nested}

Note that all values that are put in manually for presentation purposes here could be read from an arbitrarily formatted R object. Copying values manually from another source is error-prone and therefore not recommended.

# first the global level
mydata <- input_manual_nested(
  construct_name = "Self-Confidence",
  test_names = c("DSSEI", "SMTQ", "RSES"),
  items_per_test = c(20, 14, 10),
  item_names = c(
     1,  5,  9, 13, 17, # DSSEI
     3,  7, 11, 15, 19, # DSSEI
    16,  4, 12,  8, 20, # DSSEI
     2,  6, 10, 14, 18, # DSSEI
    11, 13, 14,  1,  5,  6, # SMTQ
     3, 10, 12,  8, # SMTQ
     7,  2,  4,  9, # SMTQ
     1,  3,  4,  7, 10, # RSES
     2,  5,  6,  8,  9), # RSES
  construct_loadings = c(
    .5189, .6055, .618 , .4074, .4442,
    .5203, .2479, .529 , .554 , .5144,
    .3958, .5671, .5559, .4591, .4927,
    .3713, .5941, .4903, .5998, .6616,
    .4182, .2504, .4094, .3977, .5177, .4603,
    .3271, .261 , .3614, .4226,
    .2076, .3375, .5509, .3495,
    .5482, .4627, .4185, .4185, .5319,
    .4548, .4773, .4604, .4657, .4986),
  test_loadings = c(
    .5694, .6794, .6615, .4142, .4584, # DSSEI
    .5554, .2165, .5675, .5649, .4752, # DSSEI
    .443 , .6517, .6421, .545 , .5266, # DSSEI
    .302 , .6067, .5178, .5878, .6572, # DSSEI
    .4486, .3282, .4738, .4567, .5986, .5416, # SMTQ
    .3602, .2955, .3648, .4814, # SMTQ
    .2593, .4053, .61  , .4121, # SMTQ
    .6005, .4932, .4476, .5033, .6431, # RSES
    .5806, .5907, .6179, .5899, .6559), # RSES
  correlation_matrix = matrix(
    data = c(
      1 , .73, .62,
      .73, 1, .75,
      .62, .75, 1),
    nrow = 3,
    ncol = 3))

# then add tests individually
# test 1
mydata$tests$RSES <- input_manual_simple(
  test_name = "RSES",
  facet_names = c("Ns", "Ps"),
  items_per_facet = c(5, 5),
  item_names = c(
    2, 5, 6, 8,  9,
    1, 3, 4, 7, 10),
  test_loadings = c(
    .5806, .5907, .6179, .5899, .6559,
    .6005, .4932, .4476, .5033, .6431),
  facet_loadings = c(
    .6484, .6011, .6988, .6426, .6914,
    .6422, .5835, .536, .5836, .6791),
  correlation_matrix = matrix(
    data = c(
      1, .69,
      .69, 1),
    nrow = 2,
    ncol = 2))
# test 2
mydata$tests$DSSEI <- input_manual_simple(
  test_name = "DSSEI",
  facet_names = c("Ab", "Pb", "Ph", "So"),
  items_per_facet = 5,
  item_names = c(
    2, 6, 10, 14, 18,
    16, 4, 12, 8, 20,
    3, 7, 11, 15, 19,
    1, 5, 9, 13, 17),
  test_loadings = c(
    .302 , .6067, .5178, .5878, .6572,
    .443 , .6517, .6421, .545 , .5266,
    .5554, .2165, .5675, .5649, .4752,
    .5694, .6794, .6615, .4142, .4584),
  facet_loadings = c(
    .3347, .6537, .6078, .684 , .735 ,
    .6861, .8746, .7982, .7521, .6794,
    .7947, .3737, .819 , .7099, .5785,
    .7293, .8284, .7892, .3101, .4384),
  correlation_matrix = matrix(
    data = c(
      1, .49, .66, .76,
      .49, 1, .37, .54,
      .66, .37, 1, .53,
      .76, .54, .53, 1),
    nrow = 4,
    ncol = 4))
# test 3
mydata$tests$SMTQ <- input_manual_simple(
  test_name = "SMTQ",
  facet_names = c("Cf", "Cs", "Ct"),
  items_per_facet = c(6, 4, 4),
  item_names = c(
    11, 13, 14, 1, 5, 6,
    3, 10, 12, 8,
    7, 2, 4, 9),
  test_loadings = c(
    .4486, .3282, .4738, .4567, .5986, .5416,
    .3602, .2955, .3648, .4814,
    .2593, .4053, .61  , .4121),
  facet_loadings = c(
    .4995, .3843, .5399, .4562, .6174, .6265,
    .4601, .3766, .4744, .5255,
    .3546, .5038, .7429, .4342),
  correlation_matrix = matrix(
    data = c(
      1, .71, .62,
      .71, 1, .59,
      .62, .59, 1),
    nrow = 3,
    ncol = 3))

# finally process (as in a simple case)
my_processed_data <- input_manual_process(mydata)
my_processed_data


NilsPetras/IPV documentation built on July 19, 2023, 9:12 p.m.