knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(LikertMakeR)
knitr::include_graphics("LikertMakeR_3.png")
LikertMakeR (Winzar, 2022)
lets you create synthetic Likert-scale, or related rating-scale, data.
Set the mean, standard deviation, and correlations, and the package
generates data matching those properties.
It can also rearrange existing data columns to achieve a desired
correlation structure or generate data based on
Cronbach's Alpha, factor correlations or other summary statistics.
The package should be useful for teaching in the Social Sciences, and for scholars who wish to "replicate" or "reverse engineer" rating-scale data for further analysis and visualisation when only summary statistics have been reported.
I was prompted to write the functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no apparent understanding of scale distributions, and their impact on scale properties.
Hopefully, this tool will help researchers, teachers & students, and other reviewers, to better think about rating-scale distributions, and the effects of variance, scale boundaries, and number of items in a scale. Researchers can also use LikertMakeR to prepare analyses ahead of a formal survey.
A Likert scale is the mean, or sum, of several ordinal rating scales. Typically, they are bipolar (usually "agree-disagree") responses to propositions that are determined to be moderately-to-highly correlated and that capture some facet of a theoretical construct.
Rating scales, such as Likert scales, are not continuous or unbounded.
For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated '1') and 25 (all rated '5') with all integers in between, and the mean range will be '1' to '5' with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated '1') and 56 (all rated '7') with all integers in between, and the mean range will be '1' to '7' with intervals of 1/8=0.125.
Technically, because they are bounded and not continuous, parametric statistics, such as mean, standard deviation, and correlation, should not be applied to summated rating scales. In practice, however, parametric statistics are commonly used in the social sciences because:
they are in common usage and easily understood,
results and conclusions drawn from technically-correct non-parametric statistics are (almost) always the same as for parametric statistics for such data.
For example, D'Alessandro et al. (2020)
argue that a summated scale, made with multiple items,
"approaches" an interval scale measure.
This implies that parametric statistics are acceptable.
Rating-scale boundaries define minima and maxima for any scale values.
If the mean is close to one boundary then data points will gather more
closely to that boundary.
If the mean is not in the middle of a scale,
then the data will be always skewed, as shown in the following plots.
knitr::include_graphics("skew_chart.png")
lfast() generate a vector of values with predefined mean and standard deviation.
lcor() takes a dataframe of rating-scale values and rearranges the values in each column so that the columns are correlated to match a predefined correlation matrix.
makeCorrAlpha constructs a random correlation matrix of given dimensions from a predefined Cronbach's Alpha.
makeCorrLoadings constructs a random correlation matrix from a given factor loadings matrix, and factor-correlations matrix.
makeItems() is a wrapper function for lfast() and lcor() to generate synthetic rating-scale data with predefined first and second moments and a predefined correlation matrix.
makeItemsScale() generates a random dataframe of scale items based on a predefined summated scale with a desired Cronbach's Alpha.
makePaired() generates a dataframe of two correlated columns based on summary data from a paired-sample t-test.
correlateScales() creates a dataframe of correlated summated scales as one might find in completed survey questionnaire and possibly used in a Structural Equation model.
Helper Functions
alpha() calculates Cronbach's Alpha from a given correlation matrix or a given dataframe.
eigenvalues() calculates eigenvalues of a correlation matrix, reports on positive-definite status of the matrix and, optionally, displays a scree plot to visualise the eigenvalues.
> ``` > > install.packages("LikertMakeR") > library(LikertMakeR) > > ```
> ``` > > library(devtools) > install_github("WinzarH/LikertMakeR") > library(LikertMakeR) > > ```
To synthesise a rating scale with lfast(), the user must input the following parameters:
n: sample size
mean: desired mean
sd: desired standard deviation
lowerbound: desired lower bound
upperbound: desired upper bound
items: number of items making the scale - default = 1
An earlier version of LikertMakeR had a function, lexact(), which was slow and no more accurate than the latest version of lfast(). So, lexact() is now deprecated.
nItems <- 4 mean <- 2.5 sd <- 0.75 x1 <- lfast( n = 512, mean = mean, sd = sd, lowerbound = 1, upperbound = 5, items = nItems )
## distribution of x hist(x1, cex.axis = 0.5, cex.main = 0.75, breaks = seq(from = (1 - (1 / 8)), to = (5 + (1 / 8)), by = (1 / 4)), col = "skyblue", xlab = NULL, ylab = NULL, main = paste0("mu=", round(mean(x1), 2), ", sd=", round(sd(x1), 2)) )
x2 <- lfast(256, 3, 2.5, 0, 10)
## generate histogram hist(x2, cex.axis = 0.5, cex.main = 0.75, breaks = seq(from = -0.5, to = 10.5, by = 1), col = "skyblue", xlab = NULL, ylab = NULL, main = paste0("mu=", round(mean(x2), 2), ", sd=", round(sd(x2), 2)) )
The function, lcor(), rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.
lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised dataframe produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.
To create the desired correlated data, the user must define the following parameters:
data: a starter data set of rating-scales. Number of columns must match the dimensions of the target correlation matrix.
target: the target correlation matrix.
Let's generate some data: three 5-point Likert scales, each with five items.
## generate uncorrelated synthetic data n <- 128 lowerbound <- 1 upperbound <- 5 items <- 5 mydat3 <- data.frame( x1 = lfast(n, 2.5, 0.75, lowerbound, upperbound, items), x2 = lfast(n, 3.0, 1.50, lowerbound, upperbound, items), x3 = lfast(n, 3.5, 1.00, lowerbound, upperbound, items) )
The first six observations from this dataframe are:
head(mydat3, 6)
And the first and second moments (to 3 decimal places) are:
moments <- data.frame( mean = apply(mydat3, 2, mean) |> round(3), sd = apply(mydat3, 2, sd) |> round(3) ) |> t() moments
We can see that the data have first and second moments are very close to what is expected.
As we should expect, randomly-generated synthetic data have low correlations:
cor(mydat3) |> round(2)
Now, let's define a target correlation matrix:
## describe a target correlation matrix tgt3 <- matrix( c( 1.00, 0.85, 0.75, 0.85, 1.00, 0.65, 0.75, 0.65, 1.00 ), nrow = 3 )
So now we have a dataframe with desired first and second moments, and a target correlation matrix.
## apply lcor() function new3 <- lcor(data = mydat3, target = tgt3)
Values in each column of the new dataframe do not change from the original; the values are rearranged.
The first ten observations from this dataframe are:
head(new3, 10)
And the new data frame is correlated close to our desired correlation matrix; here presented to 3 decimal places:
cor(new3) |> round(3)
makeCorrAlpha(), constructs a random correlation matrix of given dimensions and predefined Cronbach's Alpha.
To create the desired correlation matrix, the user must define the following parameters:
items: or "k" - the number of rows and columns of the desired correlation matrix.
alpha: the target value for Cronbach's Alpha
variance: a notional variance coefficient to affect the spread of values in the correlation matrix. Default = '0.5'. A value of '0' produces a matrix where all off-diagonal correlations are equal. Setting 'variance = 1.0' gives a wider range of values. Setting 'variance = 2.0', or above, may be feasible but increases the likelihood of a non-positive-definite matrix.
Random values generated by makeCorrAlpha() are highly volatile. makeCorrAlpha() may not generate a feasible (positive-definite) correlation matrix, especially when
variance is high relative to
desired Alpha, and
desired correlation dimensions
makeCorrAlpha() will inform the user if the resulting correlation matrix is positive definite, or not.
If the returned correlation matrix is not positive-definite, a feasible solution may be still possible, and often is. The user is encouraged to try again, possibly several times, to find one.
## define parameters items <- 4 alpha <- 0.85 # variance <- 0.5 ## by default ## apply makeCorrAlpha() function set.seed(42) cor_matrix_4 <- makeCorrAlpha(items, alpha)
makeCorrAlpha() produced the following correlation matrix (to three decimal places):
cor_matrix_4 |> round(3)
## using helper function alpha() alpha(cor_matrix_4)
## using helper function eigenvalues() eigenvalues(cor_matrix_4, 1)
## define parameters items <- 12 alpha <- 0.90 variance <- 1.0 ## apply makeCorrAlpha() function set.seed(42) cor_matrix_12 <- makeCorrAlpha(items = items, alpha = alpha, variance = variance)
makeCorrAlpha() produced the following correlation matrix (to two decimal places):
cor_matrix_12 |> round(2)
## calculate Cronbach's Alpha alpha(cor_matrix_12) ## calculate eigenvalues of the correlation matrix eigenvalues(cor_matrix_12, 1) |> round(3)
makeCorrLoadings() generates a correlation matrix from factor loadings and factor correlations as might be seen in Exploratory Factor Analysis (EFA) or a Structural Equation Model (SEM).
makeCorrLoadings(loadings, factorCor = NULL, uniquenesses = NULL, nearPD = FALSE)
loadings: 'k' (items) by 'f' (factors) matrix of standardised factor loadings. Item names and Factor names can be taken from the row_names (items) and the column_names (factors), if present.
factorCor: 'f' x 'f' factor correlation matrix. If not present, then we assume that the factors are uncorrelated (orthogonal), which is rare in practice, and the function applies an identity matrix for factor_cor.
uniquenesses: length 'k' vector of uniquenesses. If NULL, the default, compute from the calculated communalities.
nearPD: (logical) If TRUE, then the function calls the nearPD function from the Matrix package to transform the resulting correlation matrix onto the nearest Positive Definite matrix. Obviously, this only applies if the resulting correlation matrix is not positive definite. (It should never be needed.)
"Censored" loadings (for example, where loadings less than some small value
(often '0.30'), are removed for ease-of-communication) tend to severely reduce
the accuracy of the makeCorrLoadings()
function.
For a detailed demonstration, see the vignette file,
makeCorrLoadings_Validate.
## Example loadings factorLoadings <- matrix( c( 0.05, 0.20, 0.70, 0.10, 0.05, 0.80, 0.05, 0.15, 0.85, 0.20, 0.85, 0.15, 0.05, 0.85, 0.10, 0.10, 0.90, 0.05, 0.90, 0.15, 0.05, 0.80, 0.10, 0.10 ), nrow = 8, ncol = 3, byrow = TRUE ) ## row and column names rownames(factorLoadings) <- c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8") colnames(factorLoadings) <- c("Factor1", "Factor2", "Factor3") ## Factor correlation matrix** factorCor <- matrix( c( 1.0, 0.5, 0.4, 0.5, 1.0, 0.3, 0.4, 0.3, 1.0 ), nrow = 3, byrow = TRUE )
## apply makeCorrLoadings() function itemCorrelations <- makeCorrLoadings(factorLoadings, factorCor) ## derived correlation matrix to two decimal places round(itemCorrelations, 2)
## correlated factors mean that eigenvalues should suggest two or three factors eigenvalues(cormatrix = itemCorrelations, scree = TRUE)
## orthogonal factors are assumed when factor correlation matrix is not included orthogonalItemCors <- makeCorrLoadings(factorLoadings) ## derived correlation matrix to two decimal places round(orthogonalItemCors, 2)
## eigenvalues should suggest exactly three factors eigenvalues(cormatrix = orthogonalItemCors, scree = TRUE)
makeItems() generates a dataframe of random discrete values from a scaled Beta distribution so the data replicate a rating scale, and are correlated close to a predefined correlation matrix.
Generally, means, standard deviations, and correlations are correct to two decimal places.
makeItems() is a wrapper function for
lfast(), which takes repeated samples selecting a vector that best fits the desired moments, and
lcor(), which rearranges values in each column of the dataframe so they closely match the desired correlation matrix.
To create the desired dataframe, the user must define the following parameters:
n: number of observations
dfMeans: a vector of length 'k' of desired means of each variable
dfSds: a vector of length 'k' of desired standard deviations of each variable
lowerbound: a vector of length 'k' of values for the lower bound of each variable (For example, '1' for a 1-5 rating scale)
upperbound: a vector of length 'k' of values for the upper bound of each variable (For example, '5' for a 1-5 rating scale)
cormatrix: a target correlation matrix with 'k' rows and 'k' columns.
## define parameters n <- 128 dfMeans <- c(2.5, 3.0, 3.0, 3.5) dfSds <- c(1.0, 1.0, 1.5, 0.75) lowerbound <- rep(1, 4) upperbound <- rep(5, 4) corMat <- matrix( c( 1.00, 0.25, 0.35, 0.45, 0.25, 1.00, 0.70, 0.75, 0.35, 0.70, 1.00, 0.85, 0.45, 0.75, 0.85, 1.00 ), nrow = 4, ncol = 4 ) ## apply makeItems() function df <- makeItems( n = n, means = dfMeans, sds = dfSds, lowerbound = lowerbound, upperbound = upperbound, cormatrix = corMat ) ## test the function head(df) tail(df) ### means should be correct to two decimal places dfmoments <- data.frame( mean = apply(df, 2, mean) |> round(3), sd = apply(df, 2, sd) |> round(3) ) |> t() dfmoments ### correlations should be correct to two decimal places cor(df) |> round(3)
This is a two-step process:
apply makeCorrAlpha() to generate a correlation matrix from desired alpha,
apply makeItems() to generate rating-scale items from the correlation matrix and desired moments
Required parameters are:
k: number items/ columns
alpha: a target Cronbach's Alpha.
n: number of observations
lowerbound: a vector of length 'k' of values for the lower bound of each variable
upperbound: a vector of length 'k' of values for the upper bound of each variable
means: a vector of length 'k' of desired means of each variable
sds: a vector of length 'k' of desired standard deviations of each variable
## define parameters k <- 6 myAlpha <- 0.85 ## generate correlation matrix set.seed(42) myCorr <- makeCorrAlpha(items = k, alpha = myAlpha) ## display correlation matrix myCorr |> round(3) ### checking Cronbach's Alpha alpha(cormatrix = myCorr)
## define parameters n <- 256 myMeans <- c(2.75, 3.00, 3.00, 3.25, 3.50, 3.5) mySds <- c(1.00, 0.75, 1.00, 1.00, 1.00, 1.5) lowerbound <- rep(1, k) upperbound <- rep(5, k) ## Generate Items myItems <- makeItems( n = n, means = myMeans, sds = mySds, lowerbound = lowerbound, upperbound = upperbound, cormatrix = myCorr ) ## resulting data frame head(myItems) tail(myItems) ## means and standard deviations myMoments <- data.frame( means = apply(myItems, 2, mean) |> round(3), sds = apply(myItems, 2, sd) |> round(3) ) |> t() myMoments ## Cronbach's Alpha of data frame alpha(NULL, myItems)
# Correlation panel panel.cor <- function(x, y) { usr <- par("usr") on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- round(cor(x, y), digits = 2) txt <- paste0(r) cex.cor <- 0.8 / strwidth(txt) text(0.5, 0.5, txt, cex = 1.25) } # Customize upper panel upper.panel <- function(x, y) { points(x, y, pch = 19, col = "#0000ff11") } # diagonals panel.hist <- function(x, ...) { usr <- par("usr") on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5)) h <- hist(x, plot = FALSE) breaks <- h$breaks nB <- length(breaks) y <- h$counts y <- y / max(y) rect(breaks[-nB], 0, breaks[-1], y, col = "#87ceeb66") } # Create the plots pairs(myItems, lower.panel = panel.cor, upper.panel = upper.panel, diag.panel = panel.hist )
To create the desired dataframe, the user must define the following parameters:
scale: a vector or dataframe of the summated rating scale. Should range from ('lowerbound' * 'items') to ('upperbound' * 'items')
lowerbound: lower bound of the scale item (example: '1' in a '1' to '5' rating)
upperbound: upper bound of the scale item (example: '5' in a '1' to '5' rating)
items: k, or number of columns to generate
alpha: desired Cronbach's Alpha. Default = '0.8'
variance: quantile for selecting the combination of items that give summated scores. Must lie between '0' (minimum variance) and '1' (maximum variance). Default = '0.5'.
## define parameters n <- 256 mean <- 3.00 sd <- 0.85 lowerbound <- 1 upperbound <- 5 items <- 4 ## apply lfast() function meanScale <- lfast( n = n, mean = mean, sd = sd, lowerbound = lowerbound, upperbound = upperbound, items = items ) ## sum over all items summatedScale <- meanScale * items
## Histogram of summated scale hist(summatedScale, cex.axis = 0.5, cex.main = 0.75, breaks = seq( from = ((lowerbound * items) - 0.5), to = ((upperbound * items) + 0.5), by = 1 ), col = "skyblue", xlab = NULL, ylab = NULL, main = paste0( "mu=", round(mean * items, 2), ", sd=", round(sd * items, 2), ", range:", (lowerbound * items), ":", (upperbound * items) ) )
## apply makeItemsScale() function newItems_1 <- makeItemsScale( scale = summatedScale, lowerbound = lowerbound, upperbound = upperbound, items = items ) ### First 10 observations and summated scale head(cbind(newItems_1, summatedScale), 10) ### correlation matrix cor(newItems_1) |> round(2) ### default Cronbach's alpha = 0.80 alpha(data = newItems_1) |> round(4) ### calculate eigenvalues and print scree plot eigenvalues(cor(newItems_1), 1) |> round(3)
## apply makeItemsScale() function newItems_2 <- makeItemsScale( scale = summatedScale, lowerbound = lowerbound, upperbound = upperbound, items = items, alpha = 0.9 ) ### First 10 observations and summated scale head(cbind(newItems_2, summatedScale), 10) ### correlation matrix cor(newItems_2) |> round(2) ### requested Cronbach's alpha = 0.90 alpha(data = newItems_2) |> round(4) ### calculate eigenvalues and print scree plot eigenvalues(cor(newItems_2), 1) |> round(3)
## apply makeItemsScale() function newItems_3 <- makeItemsScale( scale = summatedScale, lowerbound = lowerbound, upperbound = upperbound, items = items, alpha = 0.6, variance = 0.7 ) ### First 10 observations and summated scale head(cbind(newItems_3, summatedScale), 10) ### correlation matrix cor(newItems_3) |> round(2) ### requested Cronbach's alpha = 0.70 alpha(data = newItems_3) |> round(4) ### calculate eigenvalues and print scree plot eigenvalues(cor(newItems_3), 1) |> round(3)
Generating a data for an independent-samples t-test is trivial with LikertMakeR. But a dataframe for a paired-sample t-test is tricky because the observations are related to each other. That is, we must generate a dataframe of correlated observations.
Note that such tests don't even require the same sample-size.
## define parameters lower <- 1 upper <- 5 items <- 6 ## generate two independent samples x1 <- lfast( n = 20, mean = 2.5, sd = 0.75, lowerbound = lower, upperbound = upper, items = items ) x2 <- lfast( n = 30, mean = 3.0, sd = 0.85, lowerbound = lower, upperbound = upper, items = items ) ## run independent-samples t-test t.test(x1, x2)
makePaired() generates correlated values so the data replicate rating scales taken, for example, in a before and after experimental design. The function is effectively a wrapper function for lfast() and lcor() with the addition of a t-statistic from which the between-column correlation is inferred.
Paired t-tests apply to observations that are associated with each other. For example: the same people rating the same object before and after a treatment, the same people rating two different objects, ratings by husband & wife, etc.
makePaired() has similar parameters as for the lfast() function with the addition of a value for the desired t-statistic.
n sample size
means a [1:2] vector of target means for two before/after measures
sds a [1:2] vector of target standard deviations
t_value desired paired t-statistic
lowerbound lower bound (e.g. '1' for a 1-5 rating scale)
upperbound upper bound (e.g. '5' for a 1-5 rating scale)
items number of items in the rating scale.
precision can relax the level of accuracy required, as in lfast().
## define parameters n <- 20 means <- c(2.5, 3.0) sds <- c(0.75, 0.85) lower <- 1 upper <- 5 items <- 6 t <- -2.5 ## run the function pairedDat <- makePaired( n = n, means = means, sds = sds, t_value = t, lowerbound = lower, upperbound = upper, items = items )
## test function output str(pairedDat) cor(pairedDat) |> round(2) pairedMoments <- data.frame( mean = apply(pairedDat, MARGIN = 2, FUN = mean) |> round(3), sd = apply(pairedDat, MARGIN = 2, FUN = sd) |> round(3) ) |> t() pairedMoments
## run a paired-sample t-test paired_t <- t.test(pairedDat$X1, pairedDat$X2, paired = TRUE) paired_t
Correlated rating-scale items generally are summed or averaged to create a measure of an "unobservable", or "latent", construct.
correlateScales() takes several such dataframes of rating-scale items and rearranges their rows so that the scales are correlated according to a predefined correlation matrix. Univariate statistics for each dataframe of rating-scale items do not change, but their correlations with rating-scale items in other dataframes do.
To run correlateScales(), parameters are:
dataframes: a list of 'k' dataframes to be rearranged and combined
scalecors: target correlation matrix - should be a symmetric k*k positive-semi-definite matrix, where 'k' is the number of dataframes
As with other functions in LikertMakeR, correlateScales() focuses on item and scale moments (mean and standard deviation) rather than on covariance structure. If you wish to simulate data for teaching or experimenting with Structural Equation modelling, then I recommend the sim.item() and sim.congeneric() functions from the psych package
n <- 128 lower <- 1 upper <- 5 ### attitude #1 #### generate a correlation matrix cor_1 <- makeCorrAlpha(items = 4, alpha = 0.80) #### specify moments as vectors means_1 <- c(2.5, 2.5, 3.0, 3.5) sds_1 <- c(0.75, 0.85, 0.85, 0.75) #### apply makeItems() function Att_1 <- makeItems( n = n, means = means_1, sds = sds_1, lowerbound = rep(lower, 4), upperbound = rep(upper, 4), cormatrix = cor_1 ) ### attitude #2 #### generate a correlation matrix cor_2 <- makeCorrAlpha(items = 5, alpha = 0.85) #### specify moments as vectors means_2 <- c(2.5, 2.5, 3.0, 3.0, 3.5) sds_2 <- c(0.75, 0.85, 0.75, 0.85, 0.75) #### apply makeItems() function Att_2 <- makeItems( n, means_2, sds_2, rep(lower, 5), rep(upper, 5), cor_2 ) ### attitude #3 #### generate a correlation matrix cor_3 <- makeCorrAlpha(items = 6, alpha = 0.90) #### specify moments as vectors means_3 <- c(2.5, 2.5, 3.0, 3.0, 3.5, 3.5) sds_3 <- c(0.75, 0.85, 0.85, 1.0, 0.75, 0.85) #### apply makeItems() function Att_3 <- makeItems( n, means_3, sds_3, rep(lower, 6), rep(upper, 6), cor_3 ) ### behavioural intention intent <- lfast(n, mean = 4.0, sd = 3, lowerbound = 0, upperbound = 10) |> data.frame() names(intent) <- "int"
## Attitude #1 A1_moments <- data.frame( means = apply(Att_1, 2, mean) |> round(2), sds = apply(Att_1, 2, sd) |> round(2) ) |> t() ### Attitude #1 moments A1_moments ### Attitude #1 correlations cor(Att_1) |> round(2) ### Attitude #1 cronbach's alpha alpha(cor(Att_1)) |> round(3) ## Attitude #2 A2_moments <- data.frame( means = apply(Att_2, 2, mean) |> round(2), sds = apply(Att_2, 2, sd) |> round(2) ) |> t() ### Attitude #2 moments A2_moments ### Attitude #2 correlations cor(Att_2) |> round(2) ### Attitude #2 cronbach's alpha alpha(cor(Att_2)) |> round(3) ## Attitude #3 A3_moments <- data.frame( means = apply(Att_3, 2, mean) |> round(2), sds = apply(Att_3, 2, sd) |> round(2) ) |> t() ### Attitude #3 moments A3_moments ### Attitude #3 correlations cor(Att_3) |> round(2) ### Attitude #2 cronbach's alpha alpha(cor(Att_3)) |> round(3) ## Behavioural Intention intent_moments <- data.frame( mean = apply(intent, 2, mean) |> round(3), sd = apply(intent, 2, sd) |> round(3) ) |> t() ### Intention moments intent_moments
### target scale correlation matrix scale_cors <- matrix( c( 1.0, 0.7, 0.6, 0.5, 0.7, 1.0, 0.4, 0.3, 0.6, 0.4, 1.0, 0.2, 0.5, 0.3, 0.2, 1.0 ), nrow = 4 ) ### bring dataframes into a list data_frames <- list("A1" = Att_1, "A2" = Att_2, "A3" = Att_3, "Int" = intent)
### apply correlateScales() function my_correlated_scales <- correlateScales( dataframes = data_frames, scalecors = scale_cors )
# Correlation panel panel.cor <- function(x, y) { usr <- par("usr") on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- round(cor(x, y), digits = 2) txt <- paste0(r) cex.cor <- 0.8 / strwidth(txt) text(0.5, 0.5, txt, cex = 1.25) } # Customize upper panel upper.panel <- function(x, y) { points(x, y, pch = 19, col = "#0000ff11") } # diagonals panel.hist <- function(x, ...) { usr <- par("usr") on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5)) h <- hist(x, plot = FALSE) breaks <- h$breaks nB <- length(breaks) y <- h$counts y <- y / max(y) rect(breaks[-nB], 0, breaks[-1], y, col = "#0000ff50") } # Create the plots pairs(my_correlated_scales, lower.panel = panel.cor, upper.panel = upper.panel, diag.panel = panel.hist )
## data structure str(my_correlated_scales)
## eigenvalues of dataframe correlations Cor_Correlated_Scales <- cor(my_correlated_scales) eigenvalues(cormatrix = Cor_Correlated_Scales, scree = TRUE) |> round(2)
#### Eigenvalues of predictor variable items only Cor_Attitude_items <- cor(my_correlated_scales[, -16]) eigenvalues(cormatrix = Cor_Attitude_items, scree = TRUE) |> round(2)
likertMakeR() includes two additional functions that may be of help when examining parameters and output.
alpha() calculates Cronbach's Alpha from a given correlation matrix or a given dataframe
eigenvalues() calculates eigenvalues of a correlation matrix, a report on whether the correlation matrix is positive definite, and produces an optional scree plot.
alpha() accepts, as input, either a correlation matrix or a dataframe. If both are submitted, then the correlation matrix is used by default, with a message to that effect.
## define parameters df <- data.frame( V1 = c(4, 2, 4, 3, 2, 2, 2, 1), V2 = c(3, 1, 3, 4, 4, 3, 2, 3), V3 = c(4, 1, 3, 5, 4, 1, 4, 2), V4 = c(4, 3, 4, 5, 3, 3, 3, 3) ) corMat <- matrix( c( 1.00, 0.35, 0.45, 0.75, 0.35, 1.00, 0.65, 0.55, 0.45, 0.65, 1.00, 0.65, 0.75, 0.55, 0.65, 1.00 ), nrow = 4, ncol = 4 ) ## apply function examples alpha(cormatrix = corMat) alpha(data = df) alpha(NULL, df) alpha(corMat, df)
eigenvalues() calculates eigenvalues of a correlation matrix, reports on whether the matrix is positive-definite, and optionally produces a scree plot.
## define parameters correlationMatrix <- matrix( c( 1.00, 0.25, 0.35, 0.45, 0.25, 1.00, 0.70, 0.75, 0.35, 0.70, 1.00, 0.85, 0.45, 0.75, 0.85, 1.00 ), nrow = 4, ncol = 4 ) ## apply function evals <- eigenvalues(cormatrix = correlationMatrix) print(evals)
evals <- eigenvalues(correlationMatrix, 1) print(evals)
LikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don't need your data to be close to exact, then other options may be faster or more flexible.
Different approaches include:
sampling from a truncated normal distribution
sampling with a predetermined probability distribution
marginal model specification
Data are sampled from a normal distribution, and then truncated to suit the rating-scale boundaries, and rounded to set discrete values as we see in rating scales.
See Heinz (2021) for an excellent and short example using the following packages:
See also the rLikert() function from the excellent latent2likert package, Lalovic (2024), for an approach using optimal discretization and skew-normal distribution. latent2likert() converts continuous latent variables into ordinal categories to generate Likert scale item responses.
n <- 128 sample(1:5, n, replace = TRUE, prob = c(0.1, 0.2, 0.4, 0.2, 0.1) )
Marginal model specification extends the idea of a predefined probability distribution to multivariate and correlated dataframes.
SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types on CRAN.
lsasim: Functions to Facilitate the Simulation of Large Scale Assessment Data on CRAN. See Matta et al. (2018)
SimCorMultRes: Simulates Correlated Multinomial Responses on CRAN. See Touloumis (2016)
covsim: VITA, IG and PLSIM Simulation for Given Covariance and Marginals on CRAN. See Grønneberg et al. (2022)
The psych package has several
excellent functions for simulating rating-scale data based on factor loadings.
These focus on factor and item correlations rather than item moments.
Highly recommended.
psych::sim.item Generate simulated data structures for circumplex, spherical, or simple structure
psych::sim.congeneric Simulate a congeneric data set with or without minor factors See Revelle (in prep)
Also:
simsem has many functions for simulating and testing data for application in Structural Equation modelling. See examples at https://simsem.org/
simpr provides a general, simple, and tidyverse-friendly framework for generating simulated data, fitting models on simulations, and tidying model results.
D'Alessandro, S., H. Winzar, B. Lowe, Ba.J. Babin, W. Zikmund (2020). Marketing Research 5ed, Cengage Australia. https://cengage.com.au/sem121/marketing-research-5th-edition-dalessandro-babin-zikmund
Grønneberg, S., Foldnes, N., & Marcoulides, K. M. (2022).
covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas. Journal of Statistical Software, 102(1), 1–45.
Heinz, A. (2021), Simulating Correlated Likert-Scale Data In R: 3 Simple Steps (blog post) https://glaswasser.github.io/simulating-correlated-likert-scale-data/
Lalovic M (2024). latent2likert: Converting Latent Variables into Likert Scale Responses. R package version 1.2.2, https://latent2likert.lalovic.io/.
Matta, T.H., Rutkowski, L., Rutkowski, D. & Liaw, Y.L. (2018),
lsasim: an R package for simulating large-scale assessment data.
Large-scale Assessments in Education 6, 15.
Pornprasertmanit, S., Miller, P., & Schoemann, A. (2021). simsem: R package for simulated structural equation modeling https://simsem.org/
Revelle, W. (in prep) An introduction to psychometric theory with applications in R. Springer. (working draft available at https://personality-project.org/r/book/ )
Touloumis, A. (2016), Simulating Correlated Binary and Multinomial Responses
under Marginal Model Specification: The SimCorMultRes Package,
The R Journal 8:2, 79-91.
Winzar, H. (2020). LikertMakeR: Synthesise and correlate Likert scale and
related rating-scale data with predefined first and second moments. CRAN:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.