simpsons_paradox: Create a Simpson's Paradox
In einGlasRotwein/cmonCorr: Creating Correlations of Specified Size

Description Usage Arguments Details Author(s) Examples

Create a Simpson's Paradox

1
2
3

simpsons_paradox(r_tot, r_sub, ngroups = NULL, nsubgroups = 50,
  means_subgroups = NULL, sd_subgroups = 1, sd_subgroups_y = NULL,
  scaling = 1, ymin = NULL)

`r_tot`	Desired Pearson correlation of the overall data. Will not always be met, see details.
`r_sub`	Pearson correlation within the subgroups.
`ngroups`	Number of subgroups present in the data. Must not be provided when a vector of subgroup means is used instead (see `means_subgroups` and details).
`nsubgroups`	Number of cases in each subgroup. Defaults to 50.
`means_subgroups`	A vector providing the x-mean for each subgroup. Must not be used when `ngroups` is used instead (see details).
`sd_subgroups`	Standard deviation for the x value of each subgroup. Can be either a vector containing standard deviations for each subgroup or an integer. In the latter case, each subgroup will have the standard deviation specified by the integer. Defaults to sd = 1 for each group.
`sd_subgroups_y`	Standard deviation for the y value of each subgroup. Can be either a vector containing standard deviations for each subgroup or an integer. In the latter case, each subgroup will have the standard deviation specified by the integer. Per default identical to `sd_subgroups`.
`scaling`	Argument determining how much the x- and y-coordinates of subgroups should be shifted in order to create the desired overall correlation. The larger `scaling` is, the larger will be the overall correlation. That is, if scaling is set to 0, the subgroup coordinates will not be shifted at all. Note that the presence of subgroups will be easier to notice if `scaling` is large. Defaults to 1.
`ymin`	Optional argument determining the smallest y-coordinate present in your data.

Creates a Simpson's Paradox by creating a correlation within a number of subgroups and then altering the groups' y-coordinates to create a correlation of the overall data. For the first step, the function sim_cor_param is used. The second step relies on the function sim_cor_vec. If the two correlations specified for each step (r_sub and r_tot, respectively) have opposite directions, a Simpson's Paradox is created.

One of either ngroups or mean_subgroups needs to be provided. That is, if the number of subgroups is specified via ngroups, the x-means for each group will be 1:ngroups. If the x-means for each subgroup are specified via the vector means_subgroups, the number of subgroups equals the number of means provided in the means_subgroups. Note that you cannot specify the y-means of each group since they will be altered in order to create the desired overall correlation. You can, however, specify the smallest y-coordinate present in your data via ymin. All data points will be shifted along the y-axis, perserving the correlations of the overall data and the subgroups. That way, you have some degree of control over the range of the y-axis.

Based on linear correlations (pearson).

Returns a data.frame that holds the x and y coordinates of the data, a group column containing the subgroup each case belongs to and the correlation within each subgroup.

Note that due to the correlations within subgroups, the overall correlation as specified in r_tot will not always be achieved. You can toy around with the group mean parameters and the scaling to get more satisfying results, but when subgroup correlations and overall correlation differ widely, this will be harder to achieve.

Juli Tkotz juliane.tkotz@hhu.de

simpson <- simpsons_paradox(r_tot = -.8, r_sub = .4, ngroups = 5, nsubgroups = 40,
                            scaling = 2)

simpson2 <- simpsons_paradox(r_tot = .4, r_sub = .-7, ngroups = 4, nsubgroups = 100,
                             scaling = 3, ymin = 10)